AI Site Grade
cockroachlabs.com — AI Site Grade
Cockroach Labs' missing /llms.txt file undermines a sophisticated AI-crawler posture, creating a 48KB error shell that wastes crawler budget and blocks structured content delivery.
Cockroach Labs has strong AI-crawler access and schema foundations, but the missing /llms.txt and a cold-knowledge gap between its legacy positioning and current AI-agent narrative limit its AI visibility.
- Findings
- 7
- Evidence checks
- 23
- Completed
- 30 May 2026
Analysis
I have all the evidence I need. Here's the audit.
---
Cockroach Labs has a sophisticated AI-crawler posture on paper that is undermined by a single missing file: /llms.txt returns a 404 served as a 48KB Next.js error shell, not a plain-text content map — and this 404 is identical for GPTBot, ClaudeBot, and every other AI crawler tested.
Crawler Access
All 11 AI bots tested (GPTBot, ClaudeBot, anthropic-ai, PerplexityBot, Perplexity-User, Google-Extended, OAI-SearchBot, ChatGPT-User, Bytespider, Applebot-Extended) receive 200 with full content on the homepage, identical to a browser baseline. The site runs on Netlify with Next.js server-side rendering, so there is no JS-rendering risk for crawlers. The robots.txt explicitly names GPTBot, PerplexityBot, ClaudeBot, anthropic-ai, and Google-Extended with Allow: / directives — a deliberate, well-maintained allowlist. The /free-tier/* path is disallowed for all bots, which blocks crawlers from a pricing-adjacent page that might contain competitive intelligence.
Missing llms.txt
The /llms.txt endpoint is a Next.js catch-all route that returns a 404 status with a full React error page (48KB of JS bundles, noindex meta). This is not a static 404 — it is a server-rendered application error that wastes crawler budget and delivers zero structured content. For a database company whose entire value proposition is technical and whose documentation spans thousands of pages, the absence of an llms.txt content map is a significant missed signal for AI training pipelines and retrieval-augmented generation systems.
Cold-Knowledge Gap
The cold LLM knows Cockroach Labs as a distributed SQL database founded by ex-Google engineers (Spencer Kimball, Peter Mattis, Ben Darnell), with a $5B valuation, Series F funding, and a 2024 layoff round. The site itself has pivoted aggressively toward AI agent memory and vector search — the homepage H1 is "The system of record to power your business," and dedicated pages pitch "The system of record for agentic memory" with MCP server, LangChain integrations, and distributed vector indexing. The cold model knows nothing about the AI-agent positioning, the MCP server, the ccloud CLI for agents, or the "CockroachDB for AI" product line. The gap between the model's stale "distributed SQL database" prior and the site's current "agentic memory infrastructure" narrative is the single largest AI-visibility risk.
Schema Posture
The homepage carries rich JSON-LD: a Service type with hasOfferCatalog (three deployment tiers), an ItemList for future-proofing guides, and a WebApplication type with an exhaustive feature list including vector search, RAG support, and LangChain integration. However, the WebApplication schema lives inside the 404 error page's JSON-LD (served on /llms.txt) rather than on the homepage itself. The homepage Service schema does not use SoftwareApplication or WebApplication as its primary type, which may dilute how knowledge graphs classify the product. Compare pages (e.g., CockroachDB vs PostgreSQL) carry only BreadcrumbList schema — no ComparisonTable or Product markup despite containing detailed feature-by-feature comparison tables in visible HTML.
External Signals
DNS TXT records confirm domain verification for OpenAI (openai-domain-verification), Anthropic (anthropic-domain-verification), Cursor (cursor-domain-verification), and Cisco — indicating active integrations or partnerships with AI platform vendors. The OpenAI and Anthropic verification tokens suggest the domain has been submitted for crawler access or API usage tracking, consistent with the permissive robots.txt. No external review sites (Gartner, Forrester, Reddit) surfaced in search, suggesting the brand's external AI-era narrative is still forming primarily through its own blog and documentation.
Findings
Missing /llms.txt returns 48KB error shell for all AI crawlers High
The /llms.txt endpoint returns a 404 status with a full Next.js error page (48KB of JS bundles, noindex meta) for GPTBot, ClaudeBot, and all tested AI crawlers. This wastes crawler budget and delivers no structured content, missing a key signal for AI training and RAG systems.
What to change: Replace the catch-all route with a static /llms.txt file that lists key documentation pages, blog posts, and product pages in plain text, following the llms.txt standard.
Cold LLM knowledge lacks current AI-agent positioning High
The cold LLM knows Cockroach Labs only as a distributed SQL database, with no awareness of its AI-agent memory, MCP server, LangChain integrations, or vector search capabilities. The site's current narrative as 'agentic memory infrastructure' is invisible to AI models that have not crawled recent content.
What to change: Publish an /llms.txt file and ensure key AI-related pages are indexed and linked from the homepage to accelerate model ingestion of the new positioning.
WebApplication schema served on 404 error page instead of homepage Medium
The WebApplication JSON-LD with vector search, RAG, and LangChain features is embedded in the 404 error page served at /llms.txt, not on the homepage. This means the schema is only accessible to crawlers that hit the error page, reducing its visibility for knowledge graph classification.
What to change: Move the WebApplication schema to the homepage and product pages, and ensure it uses SoftwareApplication or WebApplication as the primary type.
Homepage Service schema not using SoftwareApplication type Medium
The homepage's primary JSON-LD uses Service type with hasOfferCatalog, rather than SoftwareApplication or WebApplication. This may dilute how knowledge graphs classify CockroachDB as a software product, potentially affecting rich results.
What to change: Add a SoftwareApplication or WebApplication schema as the primary type on the homepage, with Service as a supplementary type if needed.
Comparison pages lack ComparisonTable or Product schema Medium
Pages like CockroachDB vs PostgreSQL contain detailed feature-by-feature comparison tables in visible HTML but only carry BreadcrumbList schema. Missing structured data for comparisons reduces the chance of appearing in AI-generated comparison snippets.
What to change: Add ComparisonTable or Product schema markup to comparison pages, encoding the feature differences in structured data.
robots.txt disallows /free-tier/* for all bots Low
The /free-tier/* path is disallowed for all bots, blocking crawlers from a pricing-adjacent page that may contain competitive intelligence or product details. This could limit visibility of pricing information in AI responses.
What to change: Review the /free-tier/ page content and consider allowing bots if it contains useful product information that should be indexed.
No external review site mentions surfaced in search Low
Searches for CockroachDB on Gartner, Forrester, and Reddit returned no results, indicating limited external validation signals that AI models might use for credibility assessment.
What's working
- Robots.txt explicitly allows major AI crawlers — The robots.txt names GPTBot, PerplexityBot, ClaudeBot, anthropic-ai, and Google-Extended with Allow: / directives, ensuring these crawlers can access the full site.
- All 11 tested AI bots receive 200 with full content on homepage — Every AI bot tested receives a 200 status with full HTML content on the homepage, identical to a browser baseline, indicating no bot-specific blocking or cloaking.
- Next.js server-side rendering eliminates JS rendering risk for crawlers — The site runs on Netlify with Next.js SSR, so all content is available in the initial HTML response, avoiding the common pitfall of JS-dependent content being invisible to crawlers.
- Homepage carries rich JSON-LD with Service, ItemList, and WebApplication types — The homepage includes structured data for Service (with offer catalog), ItemList, and WebApplication (with vector search, RAG, LangChain features), providing strong semantic signals to search engines and AI crawlers.
- Domain verified for OpenAI, Anthropic, and Cursor via DNS TXT records — DNS TXT records confirm domain verification for OpenAI, Anthropic, and Cursor, indicating active integrations or partnerships that may facilitate crawler access or API usage.
- Dedicated AI product pages with detailed content — Pages like /product/ai/ and /solutions/usecases/ai/ provide rich, technical content about AI agent memory, MCP server, and LangChain integrations, which are valuable for AI crawlers to index.
- Comprehensive documentation site with sitemap — The docs site at /docs/stable/ contains extensive technical documentation, and a sitemap at /docs/sitemap.xml is available, aiding crawler discovery.
- Active blog with regular technical content — The blog at /blog/ contains regular posts, providing a steady stream of content for AI crawlers to index and learn from.
Track cockroachlabs.com across AI search
This is one snapshot. Open the interactive report to inspect evidence, or grade another site free.