AI Site Grade
clark.io — AI Site Grade
Clark.io's thin corporate brochure and missing structured data leave AI crawlers with stale, low-signal brand knowledge despite full crawler access.
Clark.io grants unrestricted access to all major AI crawlers but lacks llms.txt, Organization schema, and substantive content, causing LLMs to understate the company's scale by 4x.
- Findings
- 9
- Evidence checks
- 23
- Completed
- 30 May 2026
Analysis
I have enough data to write a thorough audit. Let me compile my findings.
clark.io AI-Visibility Audit
The corporate site at clark.io is fully open to every major AI crawler — GPTBot, ClaudeBot, PerplexityBot, Google-Extended, OAI-SearchBot, and ChatGPT-User all return 200 with identical full HTML content — yet the site has no llms.txt, no AI-specific robots.txt directives, and no structured data beyond basic WebPage/WebSite schema, leaving AI engines to extract a generic, low-signal brand narrative from a thin corporate brochure.
Crawler Access
All 11 bot UAs tested (including Bytespider, which gets a 403 from Cloudflare) return the same 124KB HTML payload as a browser. The robots.txt is a bare WordPress default — Disallow: /wp-admin/, /?s=, /search/ — with zero AI-bot rules. No GPTBot, ClaudeBot, or Google-Extended is mentioned. The llms.txt returns 404. The site runs on Cloudflare with HSTS and CSP (frame-ancestors 'self'), but no AI-specific access controls or content-negotiation signals exist. The DNS TXT records include an anthropic-domain-verification token, confirming the brand has engaged with Anthropic's ecosystem at some point, yet no corresponding llms.txt or AI-friendly content map was deployed.
Cold-Knowledge Gap
The LLM's prior knowledge describes Clark as a German fintech with ~500,000 active users and a consumer insurance app. The actual site claims over 2 million customers across five European markets, 600+ employees, and positions itself as "one of the largest global InsurTechs" and "Europe's leading digital insurance broker." The cold model knows nothing about the CEO transition (Benedikt Kalteier succeeded founder Christopher Oster in June 2024), the M&A history (five acquisitions since 2018 including UB Partner, Schutzklick, That's Life, Anorak), the UK brands (Polly, Tom, Winston), or the positive operating result achieved in 2024. The model's user count (500K) is 4x lower than the site's stated 2 million — a material gap that means AI-generated summaries will systematically understate the company's scale.
Schema Posture
Every page carries only WebPage, WebSite, and BreadcrumbList schema from Yoast SEO. There is no Organization schema with logo, founding date, social profiles, or employee count. No FAQPage, Product, SoftwareApplication, or LocalBusiness markup exists anywhere on the site. The press articles use Article schema but lack author Person markup with proper URLs. The homepage has zero JSON-LD types beyond the generic graph — no Organization, no InsuranceBroker, no Corporation. For a company that explicitly calls itself "Europe's leading digital insurance broker," the absence of broker-specific or organization-level schema is a significant missed signal for knowledge graph construction.
Content & Structure
The site is a 12-page WordPress brochure with no blog, no FAQ, no comparison tables, no pricing, and no product documentation. The homepage text is 526 words of aspirational copy ("protect your world," "insurance expert in your pocket") with no concrete data points beyond the 2M customer and 600 employee claims. The press section is the most substantive area, containing 10 dated press releases (latest: February 2025) with real news — CEO handover, acquisitions, UK product launches, positive operating result. The innovation page mentions "AI" and "smart algorithms" in passing but provides no technical detail, API documentation, or case studies that an AI crawler could extract for a rich answer. The site has no sitemap at /sitemap.xml (404) — the actual sitemap lives at /sitemap_index.xml, which is correctly referenced in robots.txt but the canonical URL 404s.
External Signals
The DNS records reveal a sophisticated tech stack: Google Workspace (MX), AWS Route53 (NS), and integrations with Atlassian, Miro, Zendesk, Greenhouse, and Redpoints. The anthropic-domain-verification TXT record is notable — it indicates Clark proactively verified their domain with Anthropic, likely for Claude-powered features, yet the public site has no llms.txt or AI-facing content strategy. No significant external reviews, Reddit threads, or press coverage surfaced in search, suggesting the brand's AI footprint is shaped almost entirely by what the LLM already knows from pre-training data (which is stale) and what the corporate site serves (which is thin).
Findings
No llms.txt file published Medium
The site returns a 404 for llms.txt, missing an opportunity to guide AI crawlers to key content and facts.
What to change: Publish an llms.txt file at the root that lists important pages and factual summaries for AI consumption.
LLM prior knowledge understates customer count by 4x High
The LLM's pre-training knowledge estimates ~500,000 active users, while the site claims over 2 million customers. This gap means AI-generated summaries will systematically underreport the company's scale.
What to change: Add Organization schema with employee count, customer count, and founding date to the homepage to correct the knowledge gap.
No Organization schema on any page High
Despite being a company with 600+ employees and 2M customers, no page includes Organization, Corporation, or InsuranceBroker schema. The homepage only has generic WebPage/WebSite markup.
What to change: Add Organization schema with logo, founding date, social profiles, employee count, and customer count to the homepage and about page.
No FAQPage, Product, or SoftwareApplication schema Medium
The site offers no structured data for FAQs, products, or the insurance app, missing opportunities for rich results and knowledge graph enrichment.
What to change: Add FAQPage schema to any FAQ content, Product schema for insurance offerings, and SoftwareApplication schema for the mobile app.
Site is a 12-page brochure with no substantive content High
The site has no blog, FAQ, comparison tables, pricing, or product documentation. The homepage is 526 words of aspirational copy with few concrete data points. This limits the depth of information AI crawlers can extract.
What to change: Expand the site with a blog, FAQ section, and detailed product pages that provide rich, factual content for AI crawlers.
Canonical sitemap URL returns 404 Low
The standard sitemap location /sitemap.xml returns a 404, though the actual sitemap exists at /sitemap_index.xml. This may confuse crawlers that expect the canonical path.
What to change: Redirect /sitemap.xml to /sitemap_index.xml or serve the sitemap at the canonical location.
Robots.txt has no AI-bot directives Low
The robots.txt only disallows WordPress admin and search paths. No rules for GPTBot, ClaudeBot, Google-Extended, or other AI crawlers exist, leaving access fully open but unmanaged.
What to change: Consider adding explicit allow/disallow rules for AI crawlers to manage crawl budget and signal intent.
Press articles lack author Person markup Low
Press releases use Article schema but do not include author Person markup with proper URLs, reducing the credibility signal for AI knowledge graphs.
What to change: Add author Person schema with name and URL to all press articles.
No significant external reviews or press coverage found Medium
Web searches for reviews, Reddit threads, and press coverage returned zero results, indicating the brand's AI footprint is shaped almost entirely by its own thin site and stale pre-training data.
What to change: Encourage customer reviews on third-party platforms and engage with press to generate external signals.
What's working
- All major AI crawlers receive full HTML content — GPTBot, ClaudeBot, PerplexityBot, Google-Extended, and others return 200 with identical full HTML, ensuring no content is blocked.
- Domain verified with Anthropic via TXT record — The DNS TXT records include an anthropic-domain-verification token, indicating proactive engagement with Anthropic's ecosystem.
- Press section contains substantive news articles — The press page includes 10 dated press releases with real news about CEO transition, acquisitions, and financial results, providing factual content for AI crawlers.
- WebPage, WebSite, and BreadcrumbList schema present — Every page includes basic Yoast SEO schema, providing minimal structured data for search engines.
- Cloudflare with HSTS and CSP provides security — The site uses Cloudflare with HSTS and Content Security Policy, ensuring secure connections and some protection against clickjacking.
Track clark.io across AI search
This is one snapshot. Open the interactive report to inspect evidence, or grade another site free.