AI Site Grade
hiscoxgroup.com — AI Site Grade
Hiscox Group's sitemap contains 900+ double-slash URLs with broken canonicals, diluting AI crawler signal-to-noise ratio.
Hiscox Group's corporate site has strong AI crawler access and server-rendered content, but suffers from sitemap pollution, broken per-page schema, and a cold-knowledge gap around pandemic claims litigation.
- Findings
- 8
- Evidence checks
- 20
- Completed
- 30 May 2026
Analysis
I have enough data now. Let me compile the audit.
Sitemap double-slash URLs pollute 900+ entries with broken canonicals
The Hiscox Group corporate site at hiscoxgroup.com has a strong technical foundation for AI crawlers — every major bot (GPTBot, ClaudeBot, PerplexityBot, Google-Extended, OAI-SearchBot, Bytespider, Applebot-Extended) receives a 200 with full HTML content, identical to browser delivery, served through Cloudflare with no UA-based blocking. The homepage renders ~500 words of visible text, all server-rendered (Drupal), with no JS-shell risk. Yet the site harbors a structural flaw that undermines this openness.
Sitemap and Canonical Pollution
The sitemap.xml contains 916 URLs, nearly all with a double-slash prefix (https://www.hiscoxgroup.com//news/press-releases/2011/...). These double-slash URLs resolve to 200 but carry a canonical tag pointing to the single-slash version (https://www.hiscoxgroup.com/news/press-releases/2011/...). This means every AI crawler ingesting the sitemap discovers 900+ duplicate-path URLs that the site itself declares non-canonical. The sitemap also includes press releases dating back to 2011, creating a long tail of stale content that dilutes the signal-to-noise ratio for any AI model indexing the domain.
Cold-Knowledge Gap
A frontier LLM queried cold about Hiscox Ltd describes it accurately as a global specialist insurer with three segments (Retail, London Market, Re & ILS), founded 1901, LSE-listed, a major Lloyd's syndicate, and known for cyber, fine art, and professional indemnity. It also recalls the 2021 UK Supreme Court business interruption test case — a reputational signal the site itself does not address. The site's homepage and investor pages emphasize "record results," "transformation," and "AI in underwriting," but contain zero mention of the pandemic claims litigation that remains part of the brand's public record. An AI model relying solely on the site would miss this context entirely.
Schema and llms.txt Gaps
Every page inspected carries the same Article JSON-LD schema pointing to the same blog post (/news/blog/art-business-transformation-qa-aki-hussain) as its mainEntityOfPage — regardless of whether the page is an investor page, an about page, or a press release. The homepage does include an Organization and InsuranceAgency schema with address, phone, email, and social profiles, but the per-page Article schema is functionally broken: it describes a single blog post as the main entity of every page on the site. No FAQPage, BreadcrumbList, WebSite, or Product schema exists. The /llms.txt returns 404, and robots.txt has no AI-bot-specific directives — all bots are governed by the catch-all * rule, which disallows /search/, /admin/, and *.pdf but allows everything else.
Content Architecture
The site is a Drupal 10+ installation behind Cloudflare with strong security headers (HSTS, CSP, X-Frame-Options). Content is text-rich and server-rendered — the blog post on business transformation runs ~1,400 words and discusses AI models in underwriting, Google Cloud collaboration, and digital platform acquisitions. However, the news section mixes 2026 results with 2011 press releases under the same taxonomy, and the "Latest news" sidebar repeats the same three items across every page, creating redundant internal linking. No FAQ, comparison, or table answer-format signals are present on any page checked.
Findings
Sitemap contains 900+ double-slash URLs with broken canonicals High
The sitemap.xml lists 916 URLs, nearly all with a double-slash prefix (e.g., https://www.hiscoxgroup.com//news/...). These resolve to 200 but carry a canonical tag pointing to the single-slash version, creating duplicate-path entries that dilute the signal-to-noise ratio for AI crawlers.
What to change: Regenerate the sitemap to use single-slash URLs only, and remove or redirect the double-slash variants.
Sitemap includes press releases from 2011, creating stale content tail Medium
The sitemap contains press releases dating back to 2011, which are unlikely to be relevant for AI models and dilute the site's overall content quality signal.
What to change: Remove or noindex press releases older than a certain threshold (e.g., 5 years) from the sitemap.
Every page uses the same Article schema pointing to a single blog post High
All inspected pages carry an Article JSON-LD schema with mainEntityOfPage pointing to the same blog post (/news/blog/art-business-transformation-qa-aki-hussain), regardless of the page's actual content. This misleads AI crawlers about the page's primary entity.
What to change: Implement per-page schema that accurately reflects each page's content (e.g., Organization for about pages, NewsArticle for press releases).
llms.txt returns 404, missing opportunity to guide AI crawlers Medium
The /llms.txt endpoint returns a 404, meaning the site does not provide a curated file for LLMs to understand which pages are most important for training or retrieval.
What to change: Create an llms.txt file listing key pages (e.g., about, strategy, news) to help AI crawlers prioritize content.
Site omits pandemic claims litigation that is part of brand's public record Medium
A frontier LLM recalls the 2021 UK Supreme Court business interruption test case involving Hiscox, but the site's homepage and investor pages make no mention of this litigation. AI models relying solely on the site would miss this significant reputational context.
What to change: Add a section or FAQ addressing the business interruption test case and its resolution to provide balanced context.
Latest news sidebar repeats same three items across all pages Low
The 'Latest news' sidebar on every page displays the same three items, creating redundant internal linking and reducing the diversity of links for crawlers.
What to change: Make the sidebar dynamic to show different recent news items based on the page context or category.
No FAQPage, BreadcrumbList, or Product schema on any page Medium
The site lacks structured data for FAQs, breadcrumbs, or products, missing opportunities to enhance AI understanding and rich snippet eligibility.
What to change: Add BreadcrumbList schema to all pages, and consider FAQPage schema for pages with Q&A content.
robots.txt has no AI-bot-specific directives Low
The robots.txt file does not name any AI bots (e.g., GPTBot, ClaudeBot) and relies on a catch-all rule. While this currently allows all AI bots, it provides no granular control or guidance.
What to change: Add explicit directives for major AI bots to allow or disallow specific paths as needed.
What's working
- All pages are server-rendered with full HTML content — Every inspected page returns a 200 with substantial server-rendered HTML text, ensuring AI crawlers can extract content without JavaScript execution.
- All major AI bots receive 200 with full content, no UA-based blocking — Eleven major AI bots (GPTBot, ClaudeBot, PerplexityBot, etc.) all receive a 200 response identical to browser delivery, with no user-agent filtering.
- Homepage includes Organization and InsuranceAgency schema with contact details — The homepage has valid JSON-LD for Organization and InsuranceAgency, including address, phone, email, and social profiles, which helps AI models correctly identify the entity.
- Site uses strong security headers (HSTS, CSP, X-Frame-Options) — Cloudflare serves the site with HSTS, Content Security Policy, and X-Frame-Options headers, indicating good security hygiene that indirectly supports trust for AI crawlers.
- Blog posts contain substantive, relevant content (e.g., AI in underwriting) — The blog post on business transformation runs ~1,400 words and discusses AI models, Google Cloud, and digital acquisitions, providing valuable context for AI models.
- Site built on Drupal 10+ with server-rendered pages — The use of Drupal 10+ ensures consistent server-side rendering and structured content management, reducing the risk of JS-dependent content.
Track hiscoxgroup.com across AI search
This is one snapshot. Open the interactive report to inspect evidence, or grade another site free.