AI Site Grade

sanity.io — AI Site Grade

Sanity.io's AI crawler access is excellent, but the site lacks any JSON-LD schema on top-level pages and suffers from a cold-knowledge gap between its 'headless CMS' reputation and its 'Content Operating System for AI' positioning.

Sanity.io provides full HTML content to all AI crawlers and a comprehensive llms.txt, but zero JSON-LD schema on homepage and landing pages, a JS-shell answers page, and a cold-knowledge gap around its AI positioning limit its AI visibility.

Findings: 6
Evidence checks: 22
Completed: 30 May 2026

Analysis

Sanity.io: a site that does everything right for AI crawlers except tell them who it is

Every AI crawler tested — GPTBot, ClaudeBot, PerplexityBot, Google-Extended, OAI-SearchBot, Bytespider, Applebot-Extended, anthropic-ai, ChatGPT-User, Perplexity-User — receives a 200 with full HTML content (760-768 KB) from Vercel, identical to a browser. No UA-based blocking, no JS shell, no Cloudflare challenge. The robots.txt allows all AI bots with a Content-Signal: ai-train=yes, search=yes, ai-input=yes header. An exhaustive llms.txt (127 KB) exists, listing hundreds of docs pages with descriptions. The sitemap index spans 6 sub-sitemaps covering thousands of URLs. Yet the homepage and every major landing page — /studio, /content-agent, /pricing, /customers, /blog — carry zero JSON-LD schema. No Organization, no WebSite, no Product, no FAQPage. The only schema found anywhere on the domain lives deep in a docs article (/docs/ai/mcp-server), which embeds an Article + Organization block. That Organization schema is the single structured-data declaration for the entire brand on the entire site.

Crawler Access

All 11 bot UAs return 200 with full content (760-768 KB, server: Vercel). The robots.txt has no AI-specific disallow rules — only generic blocks for /debug, /styleguide, /manage, /showcase, /api/preview, /v2-docs, and JSON paths. The llms.txt at /llms.txt is a comprehensive 127 KB document listing docs, guides, and resources with descriptions. The /answers page, however, is a JS shell — a plain GET returns only 2 words of visible text ("Loading answers…") despite being listed in the sitemap. AI crawlers that do not execute JavaScript will index this as an empty page.

Cold-Knowledge Gap

The LLM prior knows Sanity as a "headless CMS" founded in 2014, with Sanity Studio, GROQ, and customers like Nike and Figma. The site itself now positions as "The Content Operating System for the AI era" — a fundamentally different category claim. The homepage H1 is "Structure powers intelligence," and the entire navigation leads with AI products: Content Agent, Agent Actions, Agent Context, MCP Server, Functions. The cold model knows nothing about Content Agent, the MCP server, Agent Context, Canvas, Blueprints, or Functions. The gap between "headless CMS" (what AI knows) and "Content Operating System for AI" (what the site says) is the single largest positioning disconnect.

Schema Posture

The homepage, /studio, /content-agent, /pricing, /customers, and /blog all return zero JSON-LD schema of any type. The only structured data on the entire domain lives on /docs/ai/mcp-server, which carries an Article + Organization block. That Organization schema contains rich data: founding date "1995" (likely incorrect — Sanity was founded in 2014), locations in Oslo and San Francisco, sameAs links, and contact points. But this schema is buried 3 levels deep in the docs, not on the homepage or any top-level page. No FAQPage schema exists despite the /pricing page containing extensive comparison tables and feature lists that would benefit from structured markup.

External Signals

DNS TXT records reveal domain verification tokens for Anthropic, Cursor, Linear, Notion, Miro, and Apple — indicating active integrations with AI tooling ecosystems. The site runs on Google Cloud DNS, Google Workspace for email, and Vercel for hosting. The robots.txt includes a custom Content-Signal directive (non-standard, not recognized by any major crawler). The blog is actively publishing AI-focused content (March-May 2026 posts on agent context, MCP, Content Agent in Slack, TypeGen). Customer stories include PUMA, Mejuri, loveholidays, Complex, Morning Brew, and Tecovas — strong social proof that the cold LLM knowledge partially captures but underweights.

Findings

Zero JSON-LD schema on homepage and all major landing pages High
The homepage, /studio, /content-agent, /pricing, /customers, and /blog return no JSON-LD schema of any type. The only structured data on the domain is an Organization block buried in a docs article, leaving AI crawlers without explicit brand, product, or FAQ markup.
What to change: Add Organization, WebSite, and Product JSON-LD schema to the homepage and relevant landing pages. Add FAQPage schema to /pricing.
Cold LLM knowledge gap: 'headless CMS' vs 'Content Operating System for AI' High
The LLM prior knows Sanity as a headless CMS, but the site now positions as 'The Content Operating System for the AI era' with AI products like Content Agent and MCP Server. This disconnect means AI crawlers may not surface the site for relevant AI queries.
What to change: Update the homepage and key pages with explicit schema markup (e.g., SoftwareApplication, Product) that reflects the new AI positioning, and ensure the llms.txt and sitemap emphasize AI-related content.
Answers page renders as a JavaScript shell for non-JS crawlers Medium
The /answers page returns only 2 words of visible text ('Loading answers…') when fetched without JavaScript, despite being listed in the sitemap. AI crawlers that do not execute JavaScript will index this as an empty page.
What to change: Implement server-side rendering or static generation for the /answers page so that AI crawlers receive full content without JavaScript execution.
Organization schema lists incorrect founding date Medium
The only Organization schema on the domain, located at /docs/ai/mcp-server, states a founding date of '1995', which is incorrect (Sanity was founded in 2014). This could mislead AI crawlers and knowledge panels.
What to change: Correct the foundingDate in the Organization schema to '2014' and move the schema to the homepage.
Custom Content-Signal directive in robots.txt is non-standard Low
The robots.txt includes a custom 'Content-Signal: ai-train=yes, search=yes, ai-input=yes' directive that is not recognized by any major AI crawler. It may be ignored or cause confusion.
What to change: Remove the non-standard Content-Signal directive or replace it with standard mechanisms like meta tags or HTTP headers.
Pricing page lacks FAQPage schema despite extensive comparison tables Medium
The /pricing page contains detailed comparison tables and feature lists that would benefit from FAQPage structured data, but no such schema is present.
What to change: Add FAQPage JSON-LD schema to the pricing page to enable rich results in AI and search engines.

What's working

All 11 AI crawlers receive full HTML content with no blocking — Every tested AI bot (GPTBot, ClaudeBot, PerplexityBot, etc.) receives a 200 response with full HTML content (760-768 KB) from Vercel, identical to a browser. No UA-based blocking, JS shell, or Cloudflare challenges.
Comprehensive llms.txt (127 KB) with hundreds of docs pages — The /llms.txt file is a 127 KB document listing hundreds of documentation pages with descriptions, providing AI crawlers with a clear map of content.
Robots.txt allows all AI bots with no disallow rules — The robots.txt has no AI-specific disallow rules, only generic blocks for non-content paths. AI bots are fully permitted.
Sitemap index with 6 sub-sitemaps covering thousands of URLs — The sitemap index at /sitemap.xml lists 6 sub-sitemaps, ensuring broad coverage of the site's content for crawlers.
DNS verification tokens for Anthropic, Cursor, Linear, Notion, Miro, and Apple — DNS TXT records show domain verification tokens for multiple AI tooling ecosystems, indicating active integrations and trust signals.
Blog actively publishing AI-focused content (agent context, MCP, Content Agent) — The blog features recent posts on agent context, MCP server, Content Agent in Slack, and TypeGen, demonstrating thought leadership in AI content operations.
Customer stories include well-known brands (PUMA, Mejuri, Complex, Morning Brew) — The /customers page lists notable brands as customers, providing strong social proof that the cold LLM knowledge partially captures.
MCP server documentation includes Article and Organization schema — The /docs/ai/mcp-server page contains JSON-LD schema (Article + Organization), providing structured data for that specific page.

Track sanity.io across AI search

This is one snapshot. Open the interactive report to inspect evidence, or grade another site free.

Open this AI Site Grade Grade another site Track your brand

Analysis

Sanity.io: a site that does everything right for AI crawlers except tell them who it is

Crawler Access

Cold-Knowledge Gap

Schema Posture

External Signals

Findings

Zero JSON-LD schema on homepage and all major landing pages High

Cold LLM knowledge gap: 'headless CMS' vs 'Content Operating System for AI' High

Answers page renders as a JavaScript shell for non-JS crawlers Medium

Organization schema lists incorrect founding date Medium

Custom Content-Signal directive in robots.txt is non-standard Low

Pricing page lacks FAQPage schema despite extensive comparison tables Medium

What's working

Track sanity.io across AI search