AI Site Grade
liverpoolmuseums.org.uk — AI Site Grade
Liverpool Museums site delivers JavaScript shells to AI crawlers, blocking all substantive content despite permissive robots.txt.
The site's Next.js architecture returns zero visible text to AI crawlers, while PerplexityBot and ChatGPT-User are blocked at the CDN, and no structured data exists anywhere.
- Findings
- 12
- Evidence checks
- 41
- Completed
- 30 May 2026
Analysis
The site is a JavaScript shell that delivers zero visible content to AI crawlers despite allowing them in robots.txt
The homepage, every venue page, every story page, and every artifact page return 0-51 words of visible text from a plain GET — a Next.js application shell with no server-side rendering. GPTBot, ClaudeBot, Google-Extended, and Applebot-Extended all receive a 200 status with 26-34KB of HTML that is almost entirely <script> tags and meta tags, containing no substantive body text. PerplexityBot, ChatGPT-User, and OAI-SearchBot are blocked at the CloudFront edge with a 403. The robots.txt is a single line (User-agent: * Disallow:) with no AI-specific rules, and llms.txt times out (404-equivalent).
Crawler Access
GPTBot, ClaudeBot, Google-Extended, and Applebot-Extended all get a 200 but receive a JS shell. The response body contains <script> tags loading a Next.js application, <meta> tags, and <link> tags — but virtually zero readable content. The homepage yields 29 words of visible text; the Museum of Liverpool page yields 0 words; the International Slavery Museum page yields 0 words; even an individual artifact page yields only 12 words. PerplexityBot, ChatGPT-User, and OAI-SearchBot are blocked at the CloudFront CDN with a 403 — the only bots that receive a clear rejection. The robots.txt has no AI-bot rules at all, meaning the blocking happens at the WAF/CDN layer, not via robots.txt directives.
Content & Schema Posture
Zero JSON-LD structured data exists on any page tested — no Museum, Organization, Event, Exhibition, CreativeWork, or FAQPage schema. The homepage has no schema at all. The collections page has no schema despite describing "one of the most important and varied collections in Europe." The artifact pages have no VisualArtwork or CreativeWork schema. Heading structure is minimal: every page has an H1 (the page title) and a set of boilerplate H2 navigation headings ("Main menu", "About", "Support", "Resource", "Venue hire", "Stay in touch"). No FAQ, comparison, or table answer signals are present.
Cold-Knowledge Gap
The LLM model knows National Museums Liverpool as a group of seven museums including the Museum of Liverpool, World Museum, and International Slavery Museum, and recalls a 2023 controversy where the International Slavery Museum faced criticism for downplaying British colonialism. The site itself contains zero mention of this controversy — no apology, no statement, no curatorial review page. The model also knows the Museum of Liverpool opened in 2011 on the UNESCO World Heritage waterfront (delisted 2021), but the site's venue pages contain no historical narrative, no architectural details, and no mention of the UNESCO status or delisting. The gap between what the model knows from external sources and what the site communicates is near-total: the site provides almost no substantive text for any AI to index.
External Signals & Surprising Findings
The sitemap contains ~75,000+ URLs across 30 paginated sitemaps, dominated by individual artifact pages (/artifact/...). These artifact pages each return only 12 words of visible text — a title and a single metadata field ("Artist = ..."). The site is hosted on nginx behind AWS CloudFront with Zen Internet DNS. A Shopify subdomain (national-museums-liverpool.myshopify.com) handles the museum shop. The site uses permissions-policy: interest-cohort=() and strict-transport-security: max-age=63072000; preload. The og:image on the homepage references a 2026 exhibition ("Gender Stories"), suggesting content is current but invisible to crawlers. The site has no llms.txt, no canonical tags on any page tested, and no server-side rendering — a critical architectural failure for AI visibility.
Findings
All pages render as empty JavaScript shells to AI crawlers High
Every page tested returns 0-51 words of visible text from a plain GET, delivering a Next.js application shell with no server-side rendering. GPTBot, ClaudeBot, Google-Extended, and Applebot-Extended receive 200 status with 26-34KB of HTML containing almost no substantive body text.
What to change: Implement server-side rendering (SSR) or static site generation (SSG) for all pages so that AI crawlers receive fully rendered HTML with visible content.
PerplexityBot, ChatGPT-User, and OAI-SearchBot blocked at CDN High
These three AI crawlers receive a 403 from CloudFront, preventing any access to the site. The robots.txt has no AI-bot rules, so blocking occurs at the WAF/CDN layer.
What to change: Remove the CDN/WAF rules blocking PerplexityBot, ChatGPT-User, and OAI-SearchBot, or add explicit allow directives in robots.txt and ensure the CDN respects them.
Zero JSON-LD structured data on any page High
No page tested contains JSON-LD for Museum, Organization, Event, Exhibition, CreativeWork, or any other schema type. The collections page lacks schema despite describing a major collection. Artifact pages have no VisualArtwork or CreativeWork schema.
What to change: Add JSON-LD structured data for each page type: Museum, Organization, Event, Exhibition, CreativeWork, and VisualArtwork on artifact pages.
llms.txt file times out or is missing Medium
The llms.txt endpoint returns a read timeout, effectively providing no AI-specific guidance or content summary.
What to change: Create an llms.txt file that summarizes the site's content and provides links to key pages for AI crawlers.
Robots.txt has no AI-specific directives Medium
The robots.txt contains only a single rule allowing all user-agents, with no specific rules for AI crawlers. This leaves AI bot access entirely to CDN-level blocking, which is inconsistent.
What to change: Add explicit directives for AI crawlers in robots.txt, such as allowing GPTBot and ClaudeBot while optionally blocking others, to align with CDN rules.
Artifact pages contain only 12 words of visible text High
Individual artifact pages, which dominate the sitemap with ~75,000 URLs, return only a title and a single metadata field. This provides no substantive content for AI indexing.
What to change: Include full artifact descriptions, provenance, and historical context in the HTML body, rendered server-side.
No canonical tags on any page tested Low
Pages lack canonical URL tags, which can lead to duplicate content issues and dilute SEO signals for AI crawlers.
What to change: Add self-referencing canonical tags to all pages.
Site omits known controversy about International Slavery Museum Medium
The LLM model recalls a 2023 controversy where the International Slavery Museum faced criticism for downplaying British colonialism, but the site contains no apology, statement, or curatorial review page addressing this.
What to change: Publish a dedicated page or statement addressing the controversy, with historical context and curatorial perspective.
Venue pages omit UNESCO World Heritage status and delisting Medium
The Museum of Liverpool page lacks any mention of its location on the UNESCO World Heritage waterfront or the 2021 delisting, which the LLM model knows from external sources.
What to change: Add historical and architectural context to venue pages, including UNESCO status and delisting information.
Heading structure is minimal and boilerplate Low
Every page has only an H1 (page title) and boilerplate H2 navigation headings. No descriptive subheadings exist to help AI crawlers understand content hierarchy.
What to change: Add descriptive H2 and H3 headings that outline the page's content sections.
No FAQ, comparison, or table answer signals present Medium
The site lacks any FAQPage schema, comparison tables, or other structured answer formats that AI crawlers use for rich results.
What to change: Add FAQPage schema to visit and what's on pages, and consider comparison tables for exhibitions.
No llms.txt file available Medium
The llms.txt endpoint times out, providing no AI-specific guidance or content summary for crawlers.
What to change: Create an llms.txt file that summarizes the site's content and provides links to key pages for AI crawlers.
What's working
- Robots.txt allows all crawlers — The robots.txt file has a single rule allowing all user-agents, which is a positive baseline for AI crawler access.
- GPTBot and ClaudeBot receive 200 status — These major AI crawlers are not blocked at the CDN and receive a 200 response, allowing them to at least access the site's HTML.
- Sitemap contains ~75,000 URLs with good coverage — The sitemap is well-structured with 30 paginated sitemaps covering artifact pages, venue pages, and stories, providing a comprehensive URL list for crawlers.
- HTTPS with strong security headers — The site uses HTTPS with HSTS preload and permissions-policy, ensuring secure connections and privacy protection.
- Open Graph image references current exhibition — The homepage og:image references a 2026 exhibition ('Gender Stories'), indicating content is up-to-date.
- Collections page provides a brief description — The collections page contains 51 words of visible text describing the collection's significance, which is more than other pages.
- Stories page returns 17 words of visible text — The stories page has a small amount of visible text, slightly more than venue pages.
- Artifact pages include title and one metadata field — Artifact pages provide at least a title and an artist field, offering minimal but structured information.
Track liverpoolmuseums.org.uk across AI search
This is one snapshot. Open the interactive report to inspect evidence, or grade another site free.