AI Site Grade
oddbox.co.uk — AI Site Grade
Oddbox's sitemap directive in robots.txt points to a 404 URL, blocking AI crawler discovery of 1,351 pages.
Oddbox.co.uk has strong crawler access and a large sitemap, but a broken sitemap reference, missing schema on key pages, future-dated blog posts, and no llms.txt limit its AI visibility.
- Findings
- 8
- Evidence checks
- 30
- Completed
- 30 May 2026
Analysis
Oddbox.co.uk — AI-Visibility Audit
The sitemap referenced in robots.txt (/sitemap/sitemap-index.xml) returns a 404, while the actual sitemap lives at /sitemap-index.xml and contains 1,351 URLs — meaning every AI crawler following the robots.txt directive hits a dead end for discovery.
Crawler Access
All major AI bots — GPTBot, ClaudeBot, PerplexityBot, Google-Extended, OAI-SearchBot, Bytespider, Applebot-Extended, ChatGPT-User, and anthropic-ai — receive a 200 with full HTML content identical to the browser baseline. No UA-based blocking, no Cloudflare challenge, no thin JS shell. The site runs on Netlify with X-Frame-Options: DENY but no CSP or permissions policy. The robots.txt has a single catch-all rule disallowing search-parameter URLs and no AI-specific directives at all — no Disallow: / for any bot, no crawl-delay, no mention of GPTBot or ClaudeBot by name. llms.txt returns a 404 (Gatsby error page, 80KB of CSS/JS).
Cold-Knowledge Gap
The model's prior knowledge is broadly accurate but stale on specifics. It knows Oddbox was founded in 2016 by Emilie Vanpoperinghe and Deepak Ravindran, rescues wonky produce, and has saved "over 50,000 tonnes" (the site now says over 65,000 tonnes). The model mentions Trustpilot ratings around 4.5 stars and notes delivery delays in 2023-2024. What the model does not know: Oddbox has rebranded with an orange colour scheme, expanded into eggs via St. Ewe partnership, launched "The Market" for refillables and rescued non-produce items, and published a Do Good Report 2025. The site's positioning has shifted from "wonky veg box" to "good food doing good" — a broader mission the model has not absorbed.
Schema Posture
Only the homepage carries structured data: a single Corporation schema with name, URL, sameAs links, and a phone contact point. The blog posts use NewsArticle schema with datePublished and author (though the author URL field contains the string "undefined/authors/oddbox-team" — a broken relative reference). Every other key page — /our-mission, /how-oddbox-works, /our-growers, /fruit-and-veg, /recipes, /sustainability, /blog — has zero JSON-LD. No FAQPage, Product, ItemList, BreadcrumbList, or Organization schema on any subpage.
Content Anomalies
Blog posts carry future dates — "Our Do Good Report 2025" is dated 29 April 2026, "Egg-citing News" is 17 February 2026, "Growing Pains" is 23 January 2026, "The Future is Odd" is 1 January 2026. The Wayback Machine confirms these pages existed as early as January 2026, meaning the dates are either deliberately forward-dated or a CMS misconfiguration in Prismic/Gatsby. Either way, AI crawlers ingesting datePublished from the NewsArticle schema will treat these as future content, potentially deprioritising or ignoring them. The /sustainability page also has a stale donation figure in its body text (1,967,526 meals) that contradicts the excerpt's older number (1,760,281), suggesting copy-versioning drift.
Findings
Robots.txt sitemap directive points to a 404 URL High
The sitemap URL in robots.txt (/sitemap/sitemap-index.xml) returns 404, while the actual sitemap at /sitemap-index.xml contains 1,351 URLs. AI crawlers following the directive hit a dead end.
What to change: Update the sitemap directive in robots.txt to point to https://www.oddbox.co.uk/sitemap-index.xml.
llms.txt returns 404 with heavy error page Medium
The llms.txt file is missing, returning a 404 Gatsby error page (80KB of CSS/JS). This prevents AI crawlers from quickly discovering key resources.
What to change: Create an llms.txt file listing key pages and resources for AI crawlers.
Robots.txt lacks AI-specific directives Low
The robots.txt has a single catch-all rule disallowing search-parameter URLs and no AI-specific directives. No mention of GPTBot, ClaudeBot, or other AI bots, and no crawl-delay.
What to change: Consider adding explicit directives for AI bots to manage crawl rate and access.
Key pages lack structured data High
Pages like /our-mission, /how-oddbox-works, /our-growers, /fruit-and-veg, /recipes, /sustainability, and /blog have zero JSON-LD. No FAQPage, Product, ItemList, BreadcrumbList, or Organization schema.
What to change: Add appropriate JSON-LD structured data (e.g., Organization, BreadcrumbList, Product, FAQPage) to all key pages.
Blog NewsArticle schema has broken author URL Medium
Blog posts use NewsArticle schema with an author URL field containing the string 'undefined/authors/oddbox-team', a broken relative reference.
What to change: Fix the author URL in NewsArticle schema to use an absolute URL or remove the field if not applicable.
Blog posts have future publication dates High
Multiple blog posts carry dates in 2026 (e.g., 'Our Do Good Report 2025' dated 29 April 2026). AI crawlers may deprioritize or ignore content marked as future.
What to change: Correct the publication dates to actual dates or use a 'dateModified' field instead of 'datePublished' for future content.
Sustainability page has conflicting donation figures Low
The /sustainability page body text states '1,967,526 meals' donated, while the excerpt shows '1,760,281', indicating copy-versioning drift.
What to change: Reconcile the donation figures across the page to ensure consistency.
AI model knowledge is stale on recent developments Medium
The model's prior knowledge lacks Oddbox's rebranding, egg partnership, 'The Market' expansion, and Do Good Report 2025. The site's mission shift to 'good food doing good' is not reflected.
What to change: Publish an llms.txt and ensure key updates are well-represented in crawlable content and structured data.
What's working
- All major AI bots receive full HTML content — GPTBot, ClaudeBot, PerplexityBot, and others receive a 200 with full HTML, identical to browser baseline. No UA-based blocking or JS shell.
- Sitemap contains 1,351 URLs for discovery — The actual sitemap at /sitemap-index.xml lists 1,351 URLs, providing a comprehensive index for crawlers.
- Homepage has Organization structured data — The homepage includes a Corporation schema with name, URL, sameAs links, and phone contact point.
- Blog posts use NewsArticle schema — Blog posts include NewsArticle schema with datePublished and author fields, aiding AI understanding of content.
- All pages render full HTML content — Key pages like /how-oddbox-works, /our-mission, /blog, and /recipes return substantial HTML content, not thin JS shells.
Track oddbox.co.uk across AI search
This is one snapshot. Open the interactive report to inspect evidence, or grade another site free.