AI Site Grade
scylladb.com — AI Site Grade
ScyllaDB's homepage delivers only 18 words of visible text to AI crawlers despite all major bots receiving a 200 status — a JS-rendering gap that undermines the site's sophisticated AI-readiness infrastructure.
ScyllaDB's AI visibility is undermined by a JS-rendering gap on key pages and a cold-knowledge disconnect between its AI positioning and external model memory, despite strong llms.txt and FAQ schema.
- Findings
- 7
- Evidence checks
- 23
- Completed
- 30 May 2026
Analysis
I have enough data now. Let me compile the audit.
ScyllaDB's homepage delivers only 18 words of visible text to AI crawlers despite all major bots receiving a 200 status — a JS-rendering gap that undermines the site's sophisticated AI-readiness infrastructure.
Crawler Access
All major AI crawlers (GPTBot, ClaudeBot, Google-Extended, OAI-SearchBot, PerplexityBot, ChatGPT-User, anthropic-ai) receive a 200 status with identical byte payload from Cloudflare. Only Bytespider gets a 403 block. The robots.txt is minimal — a single User-agent: * rule disallowing only /wp-admin/ — with no AI-bot-specific directives. The llms.txt is present and unusually detailed: it lists approved high-value paths, explicitly excludes admin/legal/account routes, and includes crawl-delay guidance. This is an advanced AI-readiness signal that most comparable database vendors lack.
JS-Shell Problem
Despite all bots getting 200s, the homepage returns only 18 words of visible text from a plain GET. The content is rendered client-side via JavaScript (React-based navigation, lazy-loaded sections). The /product/ page (28 words), /blog/ (10 words), /users/ (18 words), and /scale-real-time-ai/ (45 words) all exhibit the same pattern. By contrast, the /fit/ page delivers 864 words of readable text, and the /compare/ page delivers 785 words — these are server-rendered. The inconsistency means AI crawlers that do not execute JavaScript get a near-empty shell on the most important pages (homepage, product, blog), while deeper informational pages are text-rich.
Cold-Knowledge Gap
The LLM prior knows ScyllaDB as a Cassandra-compatible, C++ NoSQL database used by Discord, Comcast, and Expedia, with a reputation for 10x Cassandra throughput and a shared-nothing Seastar architecture. The site itself now positions entirely around "Real-Time AI" — vector search, feature stores, agent state management, and billion-scale embeddings. The cold knowledge mentions no AI positioning at all. The site has fully rebranded around AI workloads (title tag: "ScyllaDB For Real-Time AI"), but the external model memory still describes the pre-2024 positioning as a Cassandra/DynamoDB alternative. The gap is the entire AI narrative.
Schema Posture
The homepage and most pages carry Organization, WebSite, WebPage, and BreadcrumbList schema — solid but generic. The /product/ page includes a FAQPage schema with 5 architecture questions (shard-per-core, I/O schedulers, caching, tablets, workload prioritization). The /vector-search/ page also has a FAQPage with 5 AI-specific Q&A pairs. However, the FAQ schema on /product/ is truncated mid-question ("How is ScyllaDB different from other NoSQL databases...") — the answer is cut off in the JSON-LD. No HowTo, TechArticle, or SoftwareApplication schema is used anywhere, despite the product being a database.
Content Archaeology
The sitemap contains 1,155+ URLs across 13 sub-sitemaps. A large fraction are blog posts from 2015-2017 (release notes for versions 0.11 through 1.7, Seastar meetups, old benchmarks). These stale pages remain indexed and sitemap-listed, diluting the AI-relevant signal. The blog is active (May 2026 posts), but the archive is cluttered with decade-old release notes that an AI crawler will waste crawl budget on. The llms.txt explicitly approves /blog/* without distinguishing fresh from archival content.
Findings
Key pages render as near-empty JS shells for AI crawlers High
The homepage, product, blog, and users pages return only 10-45 words of visible text from a plain GET, while deeper pages like /fit/ and /compare/ deliver 785-864 words. AI crawlers that do not execute JavaScript receive minimal content on the most important pages.
What to change: Implement server-side rendering (SSR) or static pre-rendering for the homepage, product, blog, and users pages to ensure AI crawlers receive full text content.
External model memory lacks current AI positioning High
The LLM prior describes ScyllaDB as a Cassandra-compatible NoSQL database used by Discord and Expedia, with no mention of AI workloads. The site now positions entirely around 'Real-Time AI' with vector search and agent state management, creating a disconnect between the site's narrative and external knowledge.
What to change: Increase off-site signals (press releases, technical blog posts, conference talks) that explicitly tie ScyllaDB to AI use cases to update the external model memory.
FAQ schema on /product/ page is truncated mid-question Medium
The FAQPage JSON-LD on the /product/ page contains a question that is cut off ('How is ScyllaDB different from other NoSQL databases...') with the answer missing. This may cause AI crawlers to parse incomplete or misleading information.
What to change: Complete the truncated FAQ entry by providing the full question and answer in the JSON-LD.
No SoftwareApplication or TechArticle schema on product pages Medium
Despite being a database product, the site uses only Organization, WebSite, and FAQPage schema. Adding SoftwareApplication schema with properties like applicationCategory, operatingSystem, and offers would help AI crawlers understand the product.
What to change: Add SoftwareApplication schema to the /product/ page with relevant properties (applicationCategory: 'Database', operatingSystem: 'Linux', etc.).
Sitemap contains 1,155+ URLs with many stale blog posts from 2015-2017 Medium
A large fraction of the sitemap URLs are decade-old release notes and meetup posts that dilute the AI-relevant signal. AI crawlers may waste crawl budget on outdated content.
What to change: Remove or noindex stale blog posts (pre-2020) from the sitemap, or create a separate sitemap for evergreen content.
Robots.txt lacks AI-bot-specific directives Low
The robots.txt only has a single User-agent: * rule disallowing /wp-admin/. No specific directives for GPTBot, ClaudeBot, or other AI crawlers, missing an opportunity to guide crawl behavior.
What to change: Add AI-bot-specific directives in robots.txt to allow or disallow certain paths, complementing the llms.txt.
Bytespider (Baidu AI crawler) receives 403 block Low
While all major AI crawlers get 200, Bytespider is blocked with a 403. This may limit visibility in Chinese AI ecosystems.
What to change: Allow Bytespider access if the site targets Chinese AI markets.
What's working
- Detailed llms.txt with approved paths and crawl-delay guidance — The llms.txt is unusually detailed, listing approved high-value paths, explicitly excluding admin/legal/account routes, and including crawl-delay guidance. This is an advanced AI-readiness signal that most comparable database vendors lack.
- FAQPage schema on /vector-search/ with AI-specific Q&A — The /vector-search/ page includes a FAQPage schema with 5 AI-specific Q&A pairs (e.g., 'How does ScyllaDB handle vector search at scale?'), helping AI crawlers understand the product's AI capabilities.
- Deep informational pages like /fit/ and /compare/ are server-rendered with rich text — Pages like /fit/ (864 words) and /compare/ (785 words) deliver full readable text to AI crawlers, providing substantial content for indexing.
- All major AI crawlers receive 200 status with identical content — GPTBot, ClaudeBot, Google-Extended, OAI-SearchBot, PerplexityBot, ChatGPT-User, and anthropic-ai all get 200 responses with the same byte payload, ensuring no discriminatory blocking.
- Blog is active with posts as recent as May 2026 — The blog continues to publish fresh content, indicating ongoing content marketing efforts that can attract AI crawlers.
Track scylladb.com across AI search
This is one snapshot. Open the interactive report to inspect evidence, or grade another site free.