AI Site Grade

mozzocoffee.com — AI Site Grade

Mozzo Coffee's live site is invisible to every AI crawler due to a Cloudflare JS challenge wall, while LLM cold knowledge contains contradictory founding details.

Mozzo Coffee's site returns 403 to all AI crawlers, has no robots.txt or llms.txt, lacks schema on key pages, and has near-zero external web presence, while LLM prior knowledge incorrectly states a 1999 Leeds founding instead of 2005 Southampton.

Findings
7
Evidence checks
27
Completed
30 May 2026

Analysis

Mozzo Coffee: Invisible to Every AI Crawler, Contradictory Cold Knowledge

The live site at mozzocoffee.com returns a 403 Cloudflare JS challenge wall to every AI crawler and every browser — GPTBot, ClaudeBot, PerplexityBot, Google-Extended, OAI-SearchBot, and even a standard Chrome UA all receive the same "Verifying your connection" page with zero visible content. The site is functionally invisible to the entire AI ecosystem.

Crawler Access

All 11 tested user-agents (Browser baseline, GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, anthropic-ai, PerplexityBot, Perplexity-User, Google-Extended, Bytespider, Applebot-Extended) return HTTP 403 with ~8.8KB of Cloudflare challenge HTML and no actual page content. The robots.txt and llms.txt endpoints also 403 — neither file exists or is served. The domain resolves to 23.227.38.32 (Cloudflare/Shopify), and the site runs on Shopify's platform behind Cloudflare's "Managed Challenge" security layer. No AI crawler can index a single page.

Cold-Knowledge Gap

The LLM's prior knowledge describes Mozzo Coffee as a specialty roaster founded in 1999 in Leeds, supplying over 1,000 UK hospitality businesses with wholesale coffee. The actual site (via Wayback Machine snapshot from October 2025) tells a different story: Mozzo was founded in 2005 as a wind- and solar-powered coffee cart, began roasting in 2010, and is based in Southampton (Mozzo World, Chancerygate Business Centre). The cold knowledge is wrong on founding year (1999 vs 2005), wrong on location (Leeds vs Southampton), and wrong on founding narrative (roaster vs coffee cart). The LLM also knows nothing about the C2C Fund (raised £169,541 as of May 2025), the Rebuild Women's Hope initiative, or the Idjwi Island, DRC community projects — which are the site's primary brand differentiators.

Content & Schema Posture

The homepage carries a single LocalBusiness JSON-LD schema with name, address, phone, email, and social links — but no @id, no description, no priceRange, no openingHours, and no Product schema on any product pages. The wholesale page, sustainability page, and C2C Fund page all have zero schema markup. The site has a blog ("Bean Bulletin"), brew guides, and recipes, but none carry article or FAQ schema. The homepage heading structure is flat (one H1, many H2s) with no nested hierarchy. Despite having FAQ-worthy content (sustainability claims, C2C Fund mechanics), there is no FAQ schema anywhere.

External Signals

The brand has near-zero external web presence in search results. DuckDuckGo returns zero indexed results for "mozzocoffee.com", "Mozzo Coffee UK", "Mozzo Coffee review", or "Mozzo Coffee Southampton". No Trustpilot, no Reddit threads, no press coverage appear in search. The only discoverable external footprint is the Wayback Machine snapshot and the DNS TXT records (Google site verification, Facebook domain verification, Klaviyo). This means AI models have almost no third-party signals to triangulate against, making the cold-knowledge errors persistent and uncorrectable without live site access.

Findings

  1. Cloudflare JS challenge wall blocks all AI crawlers and browsers High

    The live site at mozzocoffee.com returns HTTP 403 with a Cloudflare challenge page to every tested user-agent, including GPTBot, ClaudeBot, PerplexityBot, Google-Extended, and standard browsers. No AI crawler can access any page content.

    What to change: Configure Cloudflare to allow AI crawler user-agents (GPTBot, ClaudeBot, etc.) through the JS challenge, or serve a static version of the site to bots.

  2. robots.txt and llms.txt endpoints return 403 High

    Both /robots.txt and /llms.txt return HTTP 403, meaning no crawler directives or AI-specific content files are served. The site provides no guidance to crawlers.

    What to change: Create and serve a robots.txt that allows AI crawlers, and publish an llms.txt file summarizing the site's content for LLMs.

  3. LLM cold knowledge contradicts site facts on founding year, location, and origin story High

    LLM prior knowledge states Mozzo Coffee was founded in 1999 in Leeds as a roaster, but the site (via Wayback Machine) indicates a 2005 founding in Southampton as a coffee cart, with roasting starting in 2010. The C2C Fund and community projects are absent from LLM knowledge.

    What to change: Ensure the live site is accessible to AI crawlers so LLMs can update their knowledge with accurate founding details and community initiatives.

  4. Key pages lack structured data markup High

    The wholesale, sustainability, and C2C Fund pages have zero schema markup. The homepage has a LocalBusiness schema but it is missing @id, description, priceRange, and openingHours. No Product, Article, or FAQ schema exists anywhere.

    What to change: Add Product schema to product pages, Article schema to blog posts, FAQ schema to pages with Q&A content, and complete the LocalBusiness schema with all recommended properties.

  5. Near-zero external web presence in search results High

    DuckDuckGo returns zero indexed results for the domain, brand name, or related queries. No Trustpilot, Reddit, or press coverage appears. The only discoverable external signals are DNS TXT records and a Wayback Machine snapshot.

    What to change: Build external backlinks, claim business listings (Google Business Profile, Trustpilot), and publish press releases or guest posts to create third-party signals.

  6. Blog and brew guides inaccessible to crawlers Medium

    The blog (Bean Bulletin) and brew guides are behind the Cloudflare wall, so AI crawlers cannot index this content. The blog page timed out even via Wayback Machine.

    What to change: Allow AI crawlers through Cloudflare and ensure blog pages are statically rendered for bots.

  7. Flat heading structure on homepage Low

    The homepage uses one H1 and multiple H2s with no nested hierarchy, which reduces semantic clarity for crawlers.

    What to change: Implement a hierarchical heading structure (H1 > H2 > H3) to improve content semantics.

What's working

  • LocalBusiness schema present on homepage — The homepage includes a LocalBusiness JSON-LD schema with name, address, phone, email, and social links, providing basic business information to search engines and AI crawlers that can access it.
  • C2C Fund page with detailed community impact content — The C2C Fund page contains 611 words detailing fundraising (£169,541 raised), community projects in Idjwi Island, DRC, and the Rebuild Women's Hope initiative, offering strong brand differentiation content.
  • Sustainability page with environmental claims — The sustainability page (230 words) describes wind- and solar-powered operations and ethical sourcing, providing content that could be enhanced with schema.
  • Wholesale page with detailed service information — The wholesale page (437 words) outlines coffee supply services for hospitality businesses, including equipment and training, which is valuable for B2B AI queries.

Track mozzocoffee.com across AI search

This is one snapshot. Open the interactive report to inspect evidence, or grade another site free.

Open this AI Site Grade Grade another site Track your brand