AI Site Grade
elastic.co — AI Site Grade
Elastic.co's homepage lacks any semantic schema and its llms.txt returns 404, while the cold LLM prior remains stuck on the 2021 SSPL controversy and ELK Stack identity.
Elastic.co has strong crawler access and rich AI-aligned content, but zero structured schema on key pages and a missing llms.txt leave AI crawlers without semantic understanding, while the cold LLM prior lags behind the site's repositioning.
- Findings
- 11
- Evidence checks
- 22
- Completed
- 30 May 2026
Analysis
The Search AI Company has no llms.txt and zero structured schema on its homepage
Elastic.co presents itself as "The Search AI Company" and runs on Next.js with nginx behind Varnish cache, yet the homepage — the single most important page for AI crawler comprehension — contains no JSON-LD schema of any semantic type (no Organization, SoftwareApplication, WebSite, BreadcrumbList, or FAQPage). The only JSON-LD present is a bare WebPage with a headline string. This is a striking gap for a company whose core product is a search and AI platform that AI crawlers would naturally want to understand structurally.
Crawler Access
All 11 tested AI bot user-agents — GPTBot, ClaudeBot, PerplexityBot, Google-Extended, OAI-SearchBot, ChatGPT-User, anthropic-ai, Bytespider, Applebot-Extended, Perplexity-User — receive 200 status with identical 599KB HTML payload as a browser. No UA-based blocking, no Cloudflare challenge, no JS shell. The robots.txt has a single User-Agent: * rule with an extensive allowlist for documentation paths under /guide/en/* but no explicit AI-bot directives at all. The llms.txt returns a 404 (serving the Next.js 404 page shell, ~78KB). The sitemap.xml is a single flat file containing 11,605 URLs — massive and comprehensive.
Cold-Knowledge Gap
The cold LLM knows Elastic primarily as "the ELK Stack company" — Elasticsearch, Kibana, Logstash, Beats — and prominently recalls the 2021 SSPL license controversy and the AWS OpenSearch fork. The site itself has aggressively repositioned around "Search AI," "Agent Builder," "Elastic Workflows," and "The Elasticsearch Platform." The homepage tagline is "Better retrieval. Better answers." The cold model's prior is stuck on the 2021 licensing drama and the ELK Stack identity, while the site has moved to an AI-platform narrative with agentic capabilities, FedRAMP High authorization, and Jina AI model integrations. The licensing controversy — a major reputational signal in the model's prior — is entirely absent from the site's visible content; the /licensing page 404s.
Schema Posture
Across all pages inspected (homepage, /elasticsearch, /what-is/search-ai, /what-is/retrieval-augmented-generation, /elasticsearch/agent-builder, /about, /blog), the only JSON-LD type used is WebPage. The vector database page (/elasticsearch/vector-database) is the sole exception, carrying a proper FAQPage schema with six Q&A entries. No page uses Organization, SoftwareApplication, Product, BreadcrumbList, HowTo, or TechArticle schema. The blog listing page has no BlogPosting or Blog schema. The "what-is" pages (comprehensive educational content ideal for AI answer extraction) have no Article or TechArticle schema.
External Signals
DNS TXT records reveal an anthropic-domain-verification token, confirming Elastic has an active relationship with Anthropic for crawler or API access. The site is hosted on AWS (Route53 nameservers, IP in Google Cloud range 34.x), served via nginx with Varnish caching, and built on Next.js. The content-security-policy header restricts frame-ancestors to self and a few SaaS domains. The blog is actively publishing (May 2026 posts visible), covering agentic AI, FedRAMP, and Google Cloud partnerships — content that aligns with the new positioning but lacks structured markup to help AI engines surface it.
Findings
llms.txt returns 404 High
The llms.txt file at elastic.co/llms.txt returns a 404 status, serving a Next.js 404 page shell. This prevents AI crawlers from discovering a curated set of important URLs.
What to change: Create an llms.txt file at the root listing key documentation, product pages, and blog URLs for AI crawlers.
Homepage lacks any semantic JSON-LD schema High
The homepage contains only a bare WebPage schema with a headline string. No Organization, SoftwareApplication, WebSite, or BreadcrumbList schema is present, leaving AI crawlers without structured understanding of the company or its product.
What to change: Add Organization, SoftwareApplication, and WebSite JSON-LD schemas to the homepage.
Cold LLM prior is stuck on SSPL controversy and ELK Stack High
The cold LLM knowledge associates Elastic primarily with the 2021 SSPL license controversy and the AWS OpenSearch fork, and identifies it as the ELK Stack company. The site's repositioning to 'Search AI Company' with agentic capabilities is not reflected in the model's prior.
What to change: Publish content that directly addresses the licensing history and current open-source stance, and ensure structured schema reinforces the new brand identity.
Licensing page returns 404 Medium
The /licensing page returns a 404, which is problematic because the cold LLM prior includes the SSPL controversy. A missing licensing page prevents the site from providing authoritative information to AI crawlers.
What to change: Restore or redirect the /licensing page to a page that explains the current licensing model.
Blog listing page lacks BlogPosting or Blog schema Medium
The blog listing page has no BlogPosting or Blog JSON-LD schema, reducing the ability of AI crawlers to understand the blog structure and surface individual posts.
What to change: Add Blog and BlogPosting schemas to the blog listing and individual blog pages.
Educational 'what-is' pages lack Article or TechArticle schema Medium
Pages like /what-is/search-ai and /what-is/retrieval-augmented-generation contain comprehensive educational content but have no Article or TechArticle schema, limiting their discoverability as authoritative answers.
What to change: Add TechArticle or Article schema to all 'what-is' pages.
About page lacks Organization schema Medium
The /about page has no Organization schema, missing an opportunity to provide AI crawlers with structured company information.
What to change: Add Organization schema to the about page.
No BreadcrumbList schema on any page Low
No inspected page includes BreadcrumbList schema, which helps AI crawlers understand site hierarchy and navigation.
What to change: Add BreadcrumbList schema to all pages with breadcrumb navigation.
Product pages lack SoftwareApplication schema Medium
Pages like /elasticsearch and /elasticsearch/vector-database do not use SoftwareApplication schema, which would help AI crawlers understand the product capabilities.
What to change: Add SoftwareApplication schema to all product pages.
robots.txt has no explicit AI-bot directives Low
The robots.txt file contains only a single User-Agent: * rule with allow directives for documentation paths. No explicit rules for AI bots like GPTBot or ClaudeBot exist, leaving crawler behavior to defaults.
What to change: Add explicit directives for AI bots if any paths should be disallowed or allowed.
Sitemap is a single flat file with 11,605 URLs Low
The sitemap.xml contains 11,605 URLs in a single file without an index. While comprehensive, a single large sitemap may be less efficient for crawlers to process.
What to change: Consider splitting the sitemap into multiple files with a sitemap index for better crawl efficiency.
What's working
- All 11 tested AI bots receive full access — All tested AI crawlers (GPTBot, ClaudeBot, PerplexityBot, etc.) receive 200 status with the same HTML as a browser, with no blocking or JS shells.
- Vector database page has proper FAQPage schema — The /elasticsearch/vector-database page includes a valid FAQPage schema with six Q&A entries, aiding AI crawlers in extracting structured answers.
- DNS TXT record confirms Anthropic relationship — The DNS TXT records include an anthropic-domain-verification token, indicating an active relationship with Anthropic for crawler or API access.
- Sitemap contains 11,605 URLs covering the site — The sitemap.xml is comprehensive with over 11,000 URLs, ensuring most pages are discoverable by crawlers.
- Blog publishes AI-aligned content regularly — The blog has recent posts (May 2026) covering agentic AI, FedRAMP, and Google Cloud partnerships, aligning with the new AI platform narrative.
- Comprehensive educational 'what-is' pages — Pages like /what-is/search-ai and /what-is/retrieval-augmented-generation provide in-depth, authoritative content ideal for AI answer extraction.
- Agent Builder page showcases AI agent capabilities — The /elasticsearch/agent-builder page describes building custom AI agents, directly supporting the AI platform narrative.
- Subscriptions page provides detailed product tiers — The /subscriptions page contains 4,574 words of detailed product tier information, useful for AI crawlers understanding offerings.
Track elastic.co across AI search
This is one snapshot. Open the interactive report to inspect evidence, or grade another site free.