AI Site Grade
watchhouse.com — AI Site Grade
WatchHouse's Cloudflare JS-challenge wall blocks every AI crawler, making the entire site invisible to GPTBot, ClaudeBot, and all major AI engines.
WatchHouse.com returns 403 to every AI crawler, has no robots.txt or sitemap, and lacks product schema, leaving AI models to rely on stale off-domain knowledge.
- Findings
- 9
- Evidence checks
- 25
- Completed
- 30 May 2026
Analysis
I have enough evidence now. Let me write the audit.
WatchHouse: A Shopify site rendered invisible to every AI crawler
Every AI crawler — GPTBot, ClaudeBot, PerplexityBot, Google-Extended, OAI-SearchBot, Applebot-Extended, Bytespider, and ChatGPT-User — receives a 403 Cloudflare JS challenge wall on every URL, including the homepage, product pages, robots.txt, sitemap.xml, and llms.txt. The site is a Shopify storefront (DNS points to 23.227.38.65, Shopify's IP range; MX via Outlook; Cloudflare NS) but Cloudflare's "Under Attack" or JS-challenge mode is blocking all non-browser traffic indiscriminately. No AI crawler has ever seen a single byte of real content.
Crawler Access
compare_bot_access on https://watchhouse.com returned 403 for all 11 user-agents tested, including a plain browser UA. robots.txt and llms.txt also return 403 with Cloudflare's "Verifying your connection..." page. There is no robots.txt file — the domain has never served one to any crawler. The sitemap.xml is equally inaccessible. The site has effectively zero AI-crawl surface area. The Wayback Machine snapshot from May 2026 confirms the site has real content (product listings, subscription tiers, location data), but the live site is a JS shell to every automated agent.
Cold-Knowledge Gap
The LLM knows WatchHouse as a London-based specialty coffee roaster and cafe chain founded in 2011 by James Dickson, with architect-designed locations (including a converted Victorian public toilet in Clerkenwell), a subscription service, and a sustainability focus. This prior knowledge is detailed and accurate — but it was acquired from off-domain sources (reviews, press, social media), not from the site itself. The site's actual content includes locations in New York and Dubai (per the Wayback snapshot), which the cold model did not mention. The live site's product catalog (coffee beans, pods, equipment, subscriptions, matcha, chocolate drinks) and its "Point of Origin" documentary series are entirely invisible to AI engines.
Schema Posture
The homepage carries a single Organization JSON-LD block with name, URL, email ([email protected]), and sameAs links to Facebook, Instagram, and LinkedIn. The coffee collection page also has a minimal Organization schema. No Product, BreadcrumbList, FAQPage, LocalBusiness, or ItemList schemas are present — remarkable for a Shopify store selling 20+ SKUs across multiple locations. The sameAs array in the collection page has empty strings for several entries, indicating broken schema hydration.
External Signals
The cold LLM knowledge is the strongest external signal available. DuckDuckGo returned zero search results for queries combining "watchhouse.com" with coffee, London, or specialty terms — suggesting the domain has minimal indexed web presence even for human search. The Wayback Machine holds a snapshot from March 2026 showing a fully functional Shopify site with detailed product pages, subscription options, and location listings for London, New York, and Dubai. The gap between the archived content and the live 403 wall means AI engines describing WatchHouse today are working from stale, third-hand information.
Findings
Cloudflare JS-challenge wall blocks all AI crawlers High
Every AI crawler tested (GPTBot, ClaudeBot, PerplexityBot, Google-Extended, OAI-SearchBot, Applebot-Extended, Bytespider, ChatGPT-User) receives a 403 Cloudflare JS challenge on every URL, including the homepage, product pages, robots.txt, sitemap.xml, and llms.txt. No AI crawler has ever accessed real content.
What to change: Disable JS-challenge mode for known AI crawler user-agents in Cloudflare WAF, or serve a static HTML version to those bots.
No robots.txt file served to any crawler High
The domain has never served a robots.txt file; requests return a 403 Cloudflare wall. This prevents crawlers from discovering allowed paths and signals poor crawl governance.
What to change: Create and serve a robots.txt file that allows AI crawlers to access key pages (e.g., /collections/, /pages/).
Sitemap.xml inaccessible to crawlers High
The sitemap.xml returns a 403 Cloudflare wall, so crawlers cannot discover the site's URL structure. This severely limits indexing.
What to change: Ensure sitemap.xml is accessible to all crawlers and lists all important pages.
No llms.txt file for AI guidance Medium
The llms.txt endpoint returns 403, so AI engines have no explicit guidance on which content to use or cite.
What to change: Create an llms.txt file that points AI crawlers to key pages and provides a summary of the site.
No Product or LocalBusiness schema on product pages High
The site only has Organization schema on the homepage and collection page. No Product, BreadcrumbList, FAQPage, LocalBusiness, or ItemList schemas are present, despite selling 20+ SKUs across multiple locations.
What to change: Add Product schema to each product page, LocalBusiness schema for each cafe location, and BreadcrumbList to collection pages.
Empty sameAs entries in schema markup Low
The Organization schema on the coffee collection page contains empty strings in the sameAs array, indicating broken schema hydration.
What to change: Remove empty sameAs entries or populate them with valid URLs.
Zero indexed pages in web search High
Multiple DuckDuckGo searches for site:watchhouse.com and brand queries returned zero results, indicating the domain has minimal to no indexed web presence.
What to change: Resolve the Cloudflare block for search engine bots to allow indexing.
Cold LLM knowledge omits New York and Dubai locations Medium
The LLM's prior knowledge of WatchHouse only includes London locations, but the Wayback snapshot shows locations in New York and Dubai. This means AI models are missing key business information.
What to change: Ensure location pages are accessible to crawlers and include LocalBusiness schema with address data.
Product catalog invisible to AI engines High
The site's product catalog (coffee beans, pods, equipment, subscriptions, matcha, chocolate drinks) and the 'Point of Origin' documentary series are entirely inaccessible to AI crawlers due to the 403 wall.
What to change: Allow AI crawlers to access product pages and add Product schema.
What's working
- Organization schema present on homepage — The homepage includes a valid Organization JSON-LD block with name, URL, email, and sameAs links, providing basic brand identity to crawlers that can access it.
- Detailed off-domain brand knowledge available to LLMs — The LLM has accurate prior knowledge about WatchHouse as a London-based specialty coffee roaster founded in 2011, with a subscription service and sustainability focus, sourced from external references.
- Wayback snapshot confirms rich site content — The Wayback Machine snapshot from March 2026 shows a fully functional Shopify site with detailed product pages, subscription options, and location listings for London, New York, and Dubai.
Track watchhouse.com across AI search
This is one snapshot. Open the interactive report to inspect evidence, or grade another site free.