AI Site Grade
euronews.com — AI Site Grade
Euronews's robots.txt explicitly blocks GPTBot and Google-Extended, but every AI crawler receives a 200 with full content, rendering disallow directives unenforced.
Euronews's AI crawler access is technically open despite robots.txt blocks, but JS-dependent rendering means most AI crawlers receive empty shells instead of article content.
- Findings
- 6
- Evidence checks
- 30
- Completed
- 30 May 2026
Analysis
I have enough data to write the audit. Let me compile.
Euronews's robots.txt explicitly blocks GPTBot and Google-Extended, but every AI crawler — including those two — receives a 200 with full byte-size content, rendering the disallow directives technically unenforced at the server level.
Crawler Access
The robots.txt at euronews.com/robots.txt contains a sophisticated AI-bot policy: OAI-SearchBot, ChatGPT-User, PerplexityBot, FirecrawlAgent, AndiBot, ExaBot, PhindBot, and YouBot are explicitly allowed (Allow: /), while GPTBot, CCBot, and Google-Extended are explicitly disallowed (Disallow: /). However, compare_bot_access on the homepage shows every bot tested — GPTBot, Google-Extended, ClaudeBot, PerplexityBot, OAI-SearchBot, Bytespider, Applebot-Extended, and anthropic-ai — all return HTTP 200 with identical byte sizes (~587KB). No bot receives a 403, 429, or redirect. The disallow rules exist only as a signal; the server does not enforce them.
JS-Rendering Risk
All raw fetch_as_bot requests (including Browser UA) return a thin HTML shell with obfuscated JavaScript from html-load.com — an anti-adblock interstitial loader. The actual article content is loaded dynamically via client-side JavaScript. The fetch_url tool (which uses a more complete browser-like fetch) successfully extracts 667+ words of article text and full JSON-LD, but a simple HTTP GET by any AI crawler that does not execute JavaScript receives only the obfuscated loader script. This means every AI crawler that respects robots.txt but does not render JS gets a blank shell, not the article content.
Cold-Knowledge Gap
The LLM's cold knowledge correctly identifies Euronews as a multilingual news network headquartered in Lyon, founded in 1993, and notes the 2022 acquisition by Al-Masry Al-Youm (via Alpac Capital). It flags reputational concerns about pro-Russian slant in the Arabic service and post-acquisition layoffs. The site itself does not address these controversies anywhere on the /about page — it describes itself as "unapologetically impartial and independent" and lists its shareholders as Alpac Capital (97.6%), ADMIC (Abu Dhabi), SNRT (Morocco), and PBS (Malta), without mentioning the Egyptian ownership connection that the model already knows about.
Schema Posture
Articles use NewsArticle schema with full articleBody, datePublished, dateModified, image, speakable, and publisher references. The homepage and vertical pages carry WebSite and Organization schema with SearchAction and sameAs links. No FAQPage, HowTo, Product, or BreadcrumbList schema was found on the homepage (though articles have breadcrumb JSON-LD). The schema is well-structured but no isAccessibleForFree or hasPart indicators signal paywall or registration requirements, which is accurate — the site appears fully open.
External Signals
The DNS TXT records include an anthropic-domain-verification-9mbd5w token, confirming Euronews has proactively verified domain ownership with Anthropic for Claude's training data pipeline. Multiple google-site-verification tokens and an apple-domain-verification record are also present. The llms.txt endpoint returns HTTP 406 (Not Acceptable) — a missing or misconfigured AI content map. The sitemap structure is comprehensive with per-year article sitemaps from 2008 through 2026, plus language-specific sitemaps, but the root sitemap references http:// URLs (not https://) for several sub-sitemaps, a minor canonical inconsistency.
Findings
Article pages render as empty JavaScript shells for non-rendering crawlers High
All raw HTTP GET requests (including Browser UA) return a thin HTML shell with an obfuscated anti-adblock interstitial from html-load.com. Actual article content is loaded dynamically via client-side JavaScript. AI crawlers that do not execute JavaScript receive only the loader script, not the article text.
What to change: Implement server-side rendering (SSR) or prerendering for AI crawlers so that article content is included in the initial HTML response. Remove the html-load.com interstitial for known bot user agents.
Robots.txt disallow directives for GPTBot and Google-Extended are not enforced by the server High
The robots.txt explicitly disallows GPTBot, CCBot, and Google-Extended, but all tested AI crawlers return HTTP 200 with identical byte-size content (~587KB). The server does not block or redirect any bot, making the disallow rules purely symbolic.
What to change: Configure the web server to enforce robots.txt disallow rules by returning HTTP 403 or 429 for disallowed bots, or remove the disallow directives if the intent is to allow all crawlers.
LLMs.txt endpoint returns HTTP 406, missing AI content map Medium
The llms.txt file at euronews.com/llms.txt returns HTTP 406 (Not Acceptable) with an empty body. This means there is no AI-specific content map available for language models to discover structured information about the site.
What to change: Create and serve a valid llms.txt file that provides a summary of the site and links to key sections, following the llms.txt standard.
About page omits ownership controversy known to AI models Medium
The /about page describes Euronews as 'unapologetically impartial and independent' and lists shareholders as Alpac Capital, ADMIC, SNRT, and PBS, without mentioning the Egyptian ownership connection (Al-Masry Al-Youm via Alpac Capital) that the LLM's cold knowledge flags as a reputational concern. This discrepancy may lead to fabricated or contradictory information in AI-generated summaries.
What to change: Update the /about page to transparently address the ownership structure and any editorial independence safeguards, providing a clear statement that AI models can reference.
Root sitemap references HTTP URLs for some sub-sitemaps Low
The root sitemap at /sitemaps/en/sitemap.xml contains URLs using http:// instead of https:// for several sub-sitemaps, creating a canonical inconsistency that may confuse crawlers.
What to change: Update the root sitemap to use https:// URLs consistently for all sub-sitemap references.
No isAccessibleForFree or hasPart schema on articles Low
Articles use NewsArticle schema but lack isAccessibleForFree or hasPart properties to indicate paywall or registration status. While the site appears fully open, adding these properties would provide clarity for AI systems.
What to change: Add isAccessibleForFree and hasPart properties to NewsArticle schema to explicitly signal content accessibility.
What's working
- Articles include full NewsArticle schema with articleBody and speakable — Article pages contain NewsArticle JSON-LD with complete fields including articleBody, datePublished, dateModified, image, speakable, and publisher. This provides rich structured data for AI systems.
- Domain verified with Anthropic for Claude training data pipeline — DNS TXT records include an anthropic-domain-verification token, indicating proactive domain ownership verification with Anthropic to control Claude's training data usage.
- Sitemap structure covers articles from 2008 to 2026 across multiple languages — The sitemap includes per-year article sitemaps from 2008 through 2026 and language-specific sitemaps, providing thorough coverage for crawlers to discover historical and current content.
- Robots.txt explicitly allows several AI bots including OAI-SearchBot and PerplexityBot — The robots.txt file explicitly allows OAI-SearchBot, ChatGPT-User, PerplexityBot, FirecrawlAgent, AndiBot, ExaBot, PhindBot, and YouBot, signaling a welcoming stance toward many AI crawlers.
- Multiple Google site verification tokens present in DNS — DNS TXT records contain multiple google-site-verification tokens, confirming domain ownership verification with Google for search console and other services.
Track euronews.com across AI search
This is one snapshot. Open the interactive report to inspect evidence, or grade another site free.