AI Site Grade

futureplc.com — AI Site Grade

Future PLC's robots.txt has a structural defect that blocks Bytespider and 10 other AI crawlers but silently leaves GPTBot, ClaudeBot, Google-Extended, and PerplexityBot unmentioned — while Cloudflare independently rate-limits ClaudeBot with a 429 despite no robots.txt prohibitio

Future PLC's corporate site has a malformed robots.txt that fails to block most AI crawlers, no llms.txt, no AI-specific content strategy, and a future-dated dateModified that may confuse AI parsers.

Findings
10
Evidence checks
26
Completed
30 May 2026

Analysis

Excellent. Now I have a complete picture. Let me write the audit.

The robots.txt has a structural defect that blocks Bytespider and 10 other AI crawlers but silently leaves GPTBot, ClaudeBot, Google-Extended, and PerplexityBot unmentioned — while Cloudflare independently rate-limits ClaudeBot with a 429 despite no robots.txt prohibition.

Crawler Access

The robots.txt at futureplc.com/robots.txt contains a malformed multi-user-agent block that lists 11 AI crawlers (bytespider, mistralai, cohere, ai2bot, youbot, omgili, diffbot, kangaroo, img2dataset, amazonbot, amazon-qbusiness) but only applies Disallow: / to the two agents on the final lines (meta-externalagent, meta-webindexer). The first 11 bots inherit only the * rules (disallow /search, /.preview/, /.context-results, /.applenews/), meaning they are not actually blocked from the homepage or any content. GPTBot, ClaudeBot, Google-Extended, OAI-SearchBot, ChatGPT-User, PerplexityBot, and anthropic-ai are entirely absent from the file. Despite this, compare_bot_access shows ClaudeBot receives a Cloudflare 429 (rate-limited) while anthropic-ai gets a full 200 — an inconsistency that means Anthropic's own crawler can index the site but ClaudeBot cannot. Bytespider gets a Cloudflare 403. All other major AI crawlers return 200 with the full 84KB HTML page, including JS, tracking scripts, and Google Tag Manager. No llms.txt exists (404). The site runs on WP Engine behind Cloudflare, served via nginx.

Cold-Knowledge Gap

The LLM prior knows Future PLC as a Bath-headquartered, LSE-listed media group owning TechRadar, Tom's Guide, PC Gamer, GamesRadar+, Marie Claire UK, and CinemaBlend. It recalls acquisition-driven growth, 2023-2024 layoffs, print closures (Total Film), and criticism of affiliate-heavy editorial. The actual site presents a polished corporate narrative: 200+ brands, 479M people reached, 41B annual ad impressions, proprietary tech (Hawk, Aperture, Vanilla, Helix), and a timeline ending in 2025 with "cemented leadership." The site never acknowledges layoffs, restructuring, or print closures — the cold knowledge of reputational friction is entirely absent from the corporate domain. The homepage's dateModified is set to 2026-05-14, a future date, which may confuse temporal reasoning in AI parsers.

Schema Posture

Every page carries the same three JSON-LD blocks: WebPage, BreadcrumbList, and Organization (with logo). The Organization schema is correctly marked as Future PLC with a logo URL. However, no WebSite schema includes isAccessibleForFree or potentialAction for AI consumption patterns. No FAQPage, Article, NewsArticle, or Product schema appears anywhere on the site. The brands page lists 50+ media properties with audience metrics but uses plain H3 headings — no ItemList or CollectionPage schema to structure the portfolio for AI extraction. The SearchAction schema targets ?s={search_term_string} which is a WordPress default, not a structured data endpoint.

External Signals

The site's external link profile is minimal: only X (Twitter) and LinkedIn in the footer. No press mentions, Reddit threads, or review citations surfaced in search. The "Future and AI" policy exists as a PDF buried under /policies/ (last updated December 2024) but is not linked from the homepage, news section, or any prominent navigation — AI crawlers would need to discover it through the sitemap. The news section shows active publishing through May 2026 (half-year results), indicating the site is well-maintained, but the absence of any AI-specific content strategy page, llms.txt, or structured data for brand portfolio means AI engines must infer the brand's scope from unstructured H3 headings and prose alone.

Findings

  1. Malformed robots.txt fails to block most AI crawlers High

    The robots.txt has a multi-user-agent block listing 11 AI crawlers but only applies Disallow: / to the last two agents; the first 11 bots inherit only the default rules, leaving them unblocked from the homepage and content. GPTBot, ClaudeBot, Google-Extended, OAI-SearchBot, ChatGPT-User, PerplexityBot, and anthropic-ai are entirely absent from the file.

    What to change: Restructure the robots.txt to correctly apply Disallow: / to all AI crawlers that should be blocked, and explicitly list or allow desired AI crawlers like GPTBot and ClaudeBot.

  2. ClaudeBot rate-limited by Cloudflare while anthropic-ai allowed High

    ClaudeBot receives a Cloudflare 429 (rate-limited) despite no robots.txt prohibition, while anthropic-ai (Anthropic's own crawler) gets a full 200 response. This inconsistency means Anthropic's crawler can index the site but ClaudeBot cannot.

    What to change: Review Cloudflare WAF rules to ensure consistent treatment of Anthropic crawlers, or explicitly allow ClaudeBot in robots.txt and whitelist its user-agent in Cloudflare.

  3. No llms.txt file available Medium

    The site returns a 404 for llms.txt, missing an opportunity to provide AI crawlers with a structured summary of the site's content and resources.

    What to change: Create an llms.txt file that lists key pages, brand portfolio, and AI guidance for crawlers.

  4. Homepage dateModified set to future date Medium

    The homepage's dateModified is set to 2026-05-14, a future date, which may confuse temporal reasoning in AI parsers and reduce trust in the site's freshness signals.

    What to change: Update the dateModified to the actual last modification date or remove it if not accurately maintained.

  5. No AI-specific content strategy page Medium

    The site lacks a dedicated page or section explaining its AI stance, usage, or guidance. The only AI policy is a PDF buried under /policies/ and not linked from prominent navigation.

    What to change: Create a visible AI policy page linked from the homepage and footer, and consider adding an AI FAQ or guidance page.

  6. Brand portfolio lacks structured data Medium

    The brands page lists 50+ media properties with audience metrics using plain H3 headings, with no ItemList or CollectionPage schema to structure the portfolio for AI extraction.

    What to change: Add ItemList or CollectionPage schema to the brands page, with each brand as a ListItem containing name, description, and audience metrics.

  7. No WebSite schema with isAccessibleForFree or potentialAction Medium

    The site lacks WebSite schema that includes isAccessibleForFree or potentialAction for AI consumption patterns, reducing clarity for AI crawlers about content accessibility.

    What to change: Add WebSite schema with isAccessibleForFree set to true and a SearchAction pointing to a proper search endpoint.

  8. Corporate site omits layoffs and restructuring known to LLMs Low

    LLM prior knowledge includes 2023-2024 layoffs, print closures, and criticism of affiliate-heavy editorial, but the corporate site presents a polished narrative without acknowledging these events, creating a gap between AI knowledge and site content.

  9. Minimal external link profile Low

    The site only links to X (Twitter) and LinkedIn in the footer, with no press mentions, Reddit threads, or review citations surfaced in search, limiting external signals for AI crawlers.

    What to change: Consider adding links to press coverage, industry awards, or partner sites to strengthen external signals.

  10. SearchAction schema uses WordPress default endpoint Low

    The SearchAction schema targets ?s={search_term_string}, which is a WordPress default and may not be the intended search endpoint for AI crawlers.

    What to change: Update the SearchAction target to a proper search URL or remove if not needed.

What's working

  • Organization schema correctly implemented — Every page carries an Organization schema with Future PLC name and logo URL, providing consistent brand identity for AI parsers.
  • Active news publishing through May 2026 — The news section shows recent articles including half-year results, indicating the site is well-maintained and regularly updated.
  • Sitemap available with 80 URLs — The sitemap is accessible and contains 80 URLs, helping crawlers discover key pages.
  • AI guidance PDF exists and is accessible — A PDF titled 'Future AI Guidance for Future PLC site' is available at /wp-content/uploads/2024/12/... and can be discovered via sitemap, providing some AI policy documentation.
  • BreadcrumbList schema present on pages — Every page includes BreadcrumbList schema, aiding navigation understanding for AI crawlers.
  • anthropic-ai crawler allowed and returns full content — The anthropic-ai user-agent is not blocked by robots.txt and receives a full 200 response with the complete HTML page, enabling indexing by Anthropic's crawler.

Track futureplc.com across AI search

This is one snapshot. Open the interactive report to inspect evidence, or grade another site free.

Open this AI Site Grade Grade another site Track your brand