AI Site Grade
joshwoodcolour.com — AI Site Grade
Josh Wood Colour's Gatsby JS shell renders most content invisible to AI crawlers, and the brand's flagship Miracle System is absent from LLM prior knowledge.
The site's JavaScript-dependent architecture and lack of AI crawler strategy leave product content opaque to bots, while external citation signals are thin.
- Findings
- 10
- Evidence checks
- 30
- Completed
- 30 May 2026
Analysis
Josh Wood Colour: JS-Shell Site with No AI Crawler Strategy
The site is a Gatsby JS single-page application (800KB+ HTML payload) served behind Cloudflare, where every AI crawler gets a 200 status with identical content — but that content is a JavaScript shell that requires client-side rendering to display meaningful product information, making the site functionally opaque to most AI crawlers.
Crawler Access
All 11 tested AI bots (GPTBot, ClaudeBot, PerplexityBot, Google-Extended, OAI-SearchBot, Bytespider, Applebot-Extended, anthropic-ai, ChatGPT-User, Perplexity-User) receive 200 status with identical byte-size (799,946 bytes) as a browser — no UA-based blocking. However, the robots.txt contains zero AI-specific directives. No GPTBot, ClaudeBot, or Google-Extended rules exist. The llms.txt returns a 404 (serving the full Gatsby JS shell as a 404 page). The sitemap at /sitemap.xml also 404s; the actual sitemap lives at /sitemap-0.xml (765 URLs) but is not referenced from the standard location.
JS-Rendering Risk
The homepage delivers 791 words of visible text from a plain GET, but the Gatsby framework means much of the product catalog, pricing, and interactive content is loaded client-side. The /pages/book-a-video-hair-consultation page returns only 29 words of visible text — a clear JS shell. AI crawlers that do not execute JavaScript (most of them) see a fraction of the actual content. Product pages like /products/miracle-system fare better (1,308 words) due to server-rendered schema, but collection pages and many sub-pages are thin.
Schema Posture
The homepage carries Corporation and WebSite schema with social links and a SearchAction. Product pages use ProductGroup with AggregateRating (4.4/5 from 852 reviews for the Miracle System). The collection page uses CollectionPage with an ItemList of 32 products, each with rich Product schema including price, shipping, and return policy. The salon page uses HairSalon schema with address, phone, geo, and hours. Blog articles use Article schema with author and date. Missing: FAQPage schema on the FAQ sections present on multiple pages, BreadcrumbList on any page, Organization schema with logo, and Product schema on individual product variant pages.
Cold-Knowledge Gap
The LLM knows Josh Wood as a "celebrity colorist" with clients including Kate Moss and Victoria Beckham, and recalls the brand's "Blended" range, "Root Retouch," and "Gloss" products. The site itself heavily promotes the "Miracle System" (Permanent Colour + Shade Shot + Miracle Shot) as its flagship — a product the LLM did not mention at all. The site also emphasizes clinically proven claims (62% less breakage, 2.7x stronger, 38% less fade) and a London Atelier salon with a 40+ person team — details absent from the LLM's prior knowledge. The brand's positioning as a "no damage" colour system that strengthens hair over time is its key differentiator, yet the cold model described the brand in generic terms without this central claim.
External Signals
DNS records confirm a Shopify backend (shopify-verification-code), Cloudflare CDN, Google Workspace email, Klaviyo for email marketing, and Zendesk for support. The site links to an Essentials UK wholesale portal and a MECCA Australia partnership. No Reddit threads, Trustpilot pages, or major press citations surfaced in search — the brand's external citation footprint appears thin, which limits the signal AI engines can triangulate from off-domain sources.
Findings
Key pages render as JavaScript shells with minimal visible text High
The site is a Gatsby single-page application. Pages like /pages/book-a-video-hair-consultation deliver only 29 words of visible text without JavaScript execution. Most AI crawlers do not execute JavaScript, so they see a fraction of the actual content.
What to change: Implement server-side rendering or static generation for all key pages, especially product and consultation pages, to ensure meaningful HTML is delivered without JavaScript.
Robots.txt lacks AI-specific directives Medium
The robots.txt file contains no rules for GPTBot, ClaudeBot, Google-Extended, or other AI crawlers. While all bots currently receive 200 status, the absence of explicit directives means the site cannot control AI crawler behavior.
What to change: Add explicit allow/disallow rules for AI crawlers in robots.txt, and consider using a separate AI-specific sitemap.
llms.txt returns 404, serving JS shell High
The standard /llms.txt endpoint returns a 404 page that is itself a Gatsby JS shell, providing no structured information for AI crawlers.
What to change: Create a valid llms.txt file with a summary of the site's content and links to key pages.
Standard sitemap.xml returns 404 Medium
The sitemap at /sitemap.xml returns a 404. The actual sitemap is at /sitemap-0.xml but is not referenced from the standard location, which may confuse crawlers.
What to change: Redirect /sitemap.xml to /sitemap-0.xml or serve a sitemap index at the standard location.
FAQPage schema missing from pages with FAQ content Medium
Multiple pages contain FAQ sections but lack FAQPage structured data, which is commonly used by AI assistants to extract Q&A content.
What to change: Add FAQPage schema to pages with FAQ sections.
BreadcrumbList schema absent across the site Low
No page includes BreadcrumbList structured data, which helps crawlers understand site hierarchy.
What to change: Add BreadcrumbList schema to all pages.
Organization schema missing logo and social profiles Low
The homepage uses Corporation and WebSite schema but lacks Organization schema with logo and social links, which could improve brand recognition in AI knowledge graphs.
What to change: Add Organization schema with logo and social profile URLs.
LLM prior knowledge lacks flagship Miracle System product High
The LLM knows the brand in generic terms but does not mention the Miracle System, which is the site's primary product. Key differentiators like 'no damage' and clinical claims are absent from AI knowledge.
What to change: Ensure product pages are fully server-rendered with rich schema, and consider publishing authoritative content (e.g., press releases, expert articles) that AI can cite.
Thin external citation footprint limits AI triangulation Medium
Searches for the brand on Reddit, Trustpilot, and major press outlets returned zero results. The brand's off-domain presence is minimal, reducing signals AI can use to verify and enrich knowledge.
What to change: Encourage customer reviews on third-party platforms and pursue press coverage to build external citation signals.
Collection pages may be content-thin for AI crawlers Medium
The all-products collection page returns 2,773 words, but many sub-pages likely rely on client-side rendering for product listings, reducing visible content for non-JS crawlers.
What to change: Ensure collection pages are server-rendered with full product listings in HTML.
What's working
- Product pages include ProductGroup schema with AggregateRating — Product pages like /products/miracle-system use ProductGroup schema with AggregateRating (4.4/5 from 852 reviews), providing rich data for AI crawlers.
- Collection page uses CollectionPage with ItemList of 32 products — The all-products collection page includes CollectionPage schema with an ItemList of 32 products, each with rich Product schema including price, shipping, and return policy.
- Salon page uses HairSalon schema with full contact details — The London Atelier page includes HairSalon schema with address, phone, geo coordinates, and opening hours, aiding local AI visibility.
- Blog articles use Article schema with author and date — Blog posts like the ammonia-free hair dye article include Article schema with author and publication date, which helps AI understand content provenance.
- Homepage includes Corporation and WebSite schema with SearchAction — The homepage uses Corporation and WebSite schema, including a SearchAction, providing basic brand and site information to AI crawlers.
- All 11 tested AI bots receive 200 status with no blocking — No AI crawler is blocked by robots.txt or server configuration; all receive a 200 status, ensuring they can access the site (though content is JS-dependent).
- Site uses Cloudflare CDN for performance and security — Cloudflare provides CDN and security benefits, improving page load speed and protection, which indirectly benefits crawler efficiency.
Track joshwoodcolour.com across AI search
This is one snapshot. Open the interactive report to inspect evidence, or grade another site free.