AI Site Grade

cricket.com.au — AI Site Grade

Cricket.com.au's canonical sitemap returns 404, llms.txt is missing, and schema is minimal despite rich content, limiting AI discoverability.

Cricket.com.au has strong crawler access but suffers from a broken sitemap, missing llms.txt, minimal schema, and fragmented subdomains that reduce AI visibility.

Findings
8
Evidence checks
23
Completed
30 May 2026

Analysis

Sitemap.xml returns 404 while /sitemap/index.xml works — a broken canonical sitemap URL that every search engine and AI crawler hits first.

Crawler Access

Every major AI crawler — GPTBot, ClaudeBot, PerplexityBot, Google-Extended, OAI-SearchBot, Bytespider, Applebot-Extended — receives a full 200 response with identical byte content (231KB) to a browser visit. No UA-based blocking exists. The robots.txt is a single catch-all rule (User-agent: * Disallow:) with zero AI-specific directives. The site runs on AWS CloudFront behind a PulseLive CMS, with HSTS, CSP, and X-Frame-Options: DENY in place. No JS-rendering risk: all key pages serve substantive HTML text on first GET.

Sitemap and llms.txt Gaps

The canonical sitemap URL (/sitemap.xml) returns HTTP 404. The actual sitemap lives at /sitemap/index.xml — a redirect or alias that is not advertised in robots.txt. The robots.txt references Sitemap: https://www.cricket.com.au/sitemap (no .xml), which also 404s. The /llms.txt endpoint returns a branded 404 page ("OOPS! We dropped the ball"). No AI-friendly content map exists.

Cold-Knowledge Gap

The LLM knows cricket.com.au as the official Cricket Australia hub with live scores, news, video, fixtures, and the CA Live app. It references a "Play" section with club finders and grassroots tools — but /play on the main domain returns HTTP 404. The actual grassroots portal lives on a separate subdomain (play.cricket.com.au) with its own schema (@type: Organization named "playcommunity"), which is invisible to crawlers hitting the main domain. The LLM also recalls a 2023 data breach affecting user credentials — a reputational signal the site itself does not address anywhere on its homepage or /privacy page.

Schema Posture

Every page uses only a single WebPage schema type with minimal properties — no Organization, SportsOrganization, SportsTeam, BreadcrumbList, FAQPage, or Event markup despite the site containing match fixtures, ticket sales, player profiles, and FAQ content. News articles use NewsArticle schema but with an empty mentions array and no author @type — just a plain string. The homepage has no mainEntity or about property. The creator field uses "name": "cricket-australia" as a string rather than a resolvable entity.

External Signals and Fragmentation

The RSS feed at /news/rss uses relative paths in <link> elements (e.g., /news/4511419/...) and a broken <atom:link href="https://cricket-australia/news/rss"> — the domain resolves to cricket-australia rather than www.cricket.com.au. The grassroots play.cricket.com.au subdomain operates independently with its own schema, branding, and no cross-linking from the main site's navigation. The Big Bash League lives at bigbash.com.au (external link from the 404 page), further fragmenting Cricket Australia's digital presence across at least three domains.

Findings

  1. Canonical sitemap.xml returns 404 High

    The standard sitemap URL (/sitemap.xml) returns HTTP 404. The actual sitemap lives at /sitemap/index.xml, which is not referenced in robots.txt. The robots.txt points to /sitemap (no .xml), which also 404s.

    What to change: Create a redirect from /sitemap.xml to /sitemap/index.xml and update robots.txt to point to the correct sitemap URL.

  2. llms.txt endpoint returns 404 High

    The /llms.txt endpoint returns a branded 404 page. No AI-friendly content map exists, making it harder for LLMs to discover site structure.

    What to change: Create an llms.txt file that lists key pages and sections for AI crawlers.

  3. Schema markup is minimal and lacks entity types High

    Every page uses only WebPage schema with minimal properties. No Organization, SportsOrganization, SportsTeam, BreadcrumbList, FAQPage, or Event markup exists despite match fixtures, ticket sales, and player profiles. NewsArticle schema has empty mentions array and no author @type.

    What to change: Add Organization, SportsOrganization, BreadcrumbList, and Event schema where applicable. Enrich NewsArticle with author @type and populated mentions.

  4. Play page on main domain returns 404 Medium

    The /play page on cricket.com.au returns HTTP 404, but the LLM knowledge base references a Play section. The actual grassroots portal is on a separate subdomain (play.cricket.com.au) with no cross-linking.

    What to change: Restore the /play page with a redirect or content, and add cross-links to play.cricket.com.au.

  5. Digital presence fragmented across multiple domains Medium

    Grassroots content lives on play.cricket.com.au with separate schema and branding, and Big Bash League is on bigbash.com.au. No cross-linking from the main site navigation, diluting authority.

    What to change: Consolidate key content under cricket.com.au or add strong cross-linking and consistent schema across subdomains.

  6. RSS feed uses relative paths and broken domain in atom:link Medium

    The RSS feed at /news/rss uses relative paths in <link> elements and has a broken <atom:link href="https://cricket-australia/news/rss">, which resolves to an incorrect domain.

    What to change: Update RSS feed to use absolute URLs and correct the atom:link href to https://www.cricket.com.au/news/rss.

  7. Robots.txt has no AI-specific directives Low

    The robots.txt contains a single catch-all rule with no AI bot directives, leaving all crawlers unrestricted but also missing opportunities to guide AI crawlers to key content.

    What to change: Add specific directives for AI crawlers (e.g., GPTBot, ClaudeBot) to prioritize important pages.

  8. LLM knowledge recalls 2023 data breach not addressed on site Low

    The LLM knowledge base recalls a 2023 data breach affecting user credentials, but the site does not address this on the homepage or privacy page, potentially affecting trust signals.

What's working

  • All major AI crawlers receive full 200 response — Every major AI crawler (GPTBot, ClaudeBot, PerplexityBot, etc.) receives a full 200 response with identical content to a browser visit. No UA-based blocking exists.
  • Key pages serve substantive HTML text on first GET — All key pages (home, news, matches, tickets) serve substantive HTML text without requiring JavaScript rendering, ensuring AI crawlers can parse content.
  • News articles use NewsArticle schema — News articles include NewsArticle schema markup, which helps search engines and AI understand the content type.
  • RSS feed available for news content — An RSS feed at /news/rss provides a syndication channel for news articles, aiding content discovery.
  • Play subdomain has its own Organization schema — The play.cricket.com.au subdomain includes Organization schema with @type Organization named 'playcommunity', providing structured data for grassroots cricket.

Track cricket.com.au across AI search

This is one snapshot. Open the interactive report to inspect evidence, or grade another site free.

Open this AI Site Grade Grade another site Track your brand