AI Site Grade

capstonepub.com — AI Site Grade

Capstone blocks nearly every major AI crawler at the CloudFront edge while its robots.txt contains no AI-bot directives, creating a silent blockade that no AI model can see past.

Capstone's AI visibility is crippled by a CloudFront WAF that silently blocks GPTBot, ClaudeBot, Google-Extended, and others, combined with zero schema markup and a cold-knowledge gap about its current product lineup.

Findings
10
Evidence checks
26
Completed
30 May 2026

Analysis

Capstone blocks nearly every major AI crawler at the CloudFront edge — GPTBot, ClaudeBot, Google-Extended, OAI-SearchBot, PerplexityBot, and Applebot-Extended all receive 403 responses — while the robots.txt contains zero AI-bot directives, creating a silent blockade that no AI model can see past.

Crawler Access

The site runs on Drupal 9 behind nginx with an AWS CloudFront CDN. The robots.txt at www.capstonepub.com/robots.txt contains a single User-agent: * rule with no AI-specific directives — no mention of GPTBot, ClaudeBot, Google-Extended, or any other crawler. Yet compare_bot_access reveals that GPTBot, ClaudeBot, Google-Extended, OAI-SearchBot, PerplexityBot, Applebot-Extended, ChatGPT-User, and Bytespider all receive HTTP 403 from CloudFront. Only anthropic-ai (the API crawler, distinct from ClaudeBot) and Perplexity-User (the user-facing product) get through with full 200 responses and real content. The 403 is served by CloudFront itself (not nginx), indicating a WAF-level or Lambda@Edge block based on User-Agent string matching — a configuration that is invisible from the robots.txt and would surprise any AI company expecting compliance with the declared rules.

Cold-Knowledge Gap

The LLM queried cold describes Capstone as a children's publisher serving pre-K through grade 8, naming imprints like *Pebble Plus*, *Stone Arch Books*, *Compass Point Books*, and *Picture Window Books*. The actual site positions itself as "the nation's leading educational publisher of K-5 digital solutions" — a narrower grade band than the model assumes — and heavily emphasizes PebbleGo as its flagship product ("used in 25% of U.S. elementary schools by over 3 million kids"). The model's prior knowledge is stale on product scope: it mentions *Pebble Plus* as a notable series, but the site now lists *Pebble Sprout*, *Pebble Emerge*, *Pebble Explore*, and *Brain Candy Books* as current offerings. The model also knows nothing about Stairway Decodables, Capstone Connect, or the Brain Candy Books brand — all prominently featured on the homepage.

Schema Posture

Every page examined — homepage, about-us, FAQ, newsroom, imprints, classroom programs — returns zero JSON-LD schema of any type. The homepage has no Organization, WebSite, Book, or Product schema. The FAQ page (/support/faq) contains a rich set of Q&A content but no FAQPage schema. The imprint pages describe specific book series with sales figures ("more than 2.5 million copies sold worldwide" for You Choose) but no structured data to surface those facts to knowledge graphs. This is a complete schema vacuum across a site with thousands of products and substantial brand authority.

External Signals

DNS records reveal that Capstone has proactively registered verification tokens with OpenAI (openai-domain-verification), Anthropic (anthropic-domain-verification), and Apple (apple-domain-verification), indicating awareness of AI crawler ecosystems. Yet the CloudFront WAF blocks the very crawlers those tokens are meant to authenticate. The shop subdomain (shop.capstonepub.com) is a JS-rendered Salesforce Commerce application that returns only 8 words of visible text to a plain GET — AI crawlers that do get through (like anthropic-ai) cannot access product data, ISBNs, or pricing. The newsroom page contains press releases and media mentions (NPR, eSchool News, District Administration) that serve as strong third-party credibility signals, but none of this is surfaced through structured data.

Findings

  1. Nearly all major AI crawlers blocked by CloudFront WAF High

    GPTBot, ClaudeBot, Google-Extended, OAI-SearchBot, PerplexityBot, Applebot-Extended, ChatGPT-User, and Bytespider all receive HTTP 403 responses from CloudFront. Only anthropic-ai and Perplexity-User get through. The robots.txt contains no AI-bot directives, making the blockade invisible to crawler operators.

    What to change: Remove the CloudFront WAF rules that block AI crawlers, or update robots.txt to explicitly disallow them if blocking is intentional. Ensure the WAF configuration aligns with the declared robots.txt policy.

  2. Robots.txt contains no AI crawler directives High

    The robots.txt at www.capstonepub.com/robots.txt has a single User-agent: * rule with no mention of GPTBot, ClaudeBot, Google-Extended, or any other AI crawler. This gives the false impression that AI crawlers are allowed, while the WAF silently blocks them.

    What to change: Add explicit directives for AI crawlers (e.g., Disallow for those that should be blocked, or remove WAF blocks for those that should be allowed) to make the site's policy transparent.

  3. No JSON-LD schema on any page examined High

    Every page checked — homepage, about-us, FAQ, newsroom, imprints, classroom programs — returns zero JSON-LD schema. The FAQ page has rich Q&A content but no FAQPage schema. Imprint pages describe book series with sales figures but no Product or Book schema.

    What to change: Add Organization, WebSite, and Book schema to relevant pages. Add FAQPage schema to the FAQ page. Add Product schema to product pages.

  4. Cold LLM knowledge is stale on current product lineup Medium

    The LLM queried cold describes Capstone as serving pre-K through grade 8 and mentions Pebble Plus, but the site now focuses on K-5 digital solutions with PebbleGo as flagship, and current offerings include Pebble Sprout, Pebble Emerge, Pebble Explore, Brain Candy Books, Stairway Decodables, and Capstone Connect — none of which appear in the model's prior knowledge.

    What to change: Publish an llms.txt file and ensure key product pages contain structured data and clear text descriptions to help AI models stay current.

  5. Shop subdomain returns only 8 words of visible text High

    The shop subdomain (shop.capstonepub.com) is a JS-rendered Salesforce Commerce application that returns only 8 words of visible text to a plain GET. AI crawlers that do get through cannot access product data, ISBNs, or pricing.

    What to change: Implement server-side rendering or dynamic rendering for the shop subdomain to expose product content to crawlers.

  6. Domain verification tokens for AI companies exist but crawlers are blocked Medium

    DNS records show verification tokens for OpenAI, Anthropic, and Apple, indicating awareness of AI crawler ecosystems. Yet the CloudFront WAF blocks the very crawlers those tokens are meant to authenticate, creating a contradictory posture.

    What to change: Align the WAF configuration with the domain verification tokens: either allow the verified crawlers or remove the tokens.

  7. No llms.txt file published Medium

    The site does not provide an llms.txt file, which would give AI models a curated overview of the site's content and help bridge the cold-knowledge gap.

    What to change: Create an llms.txt file at the root domain listing key pages and a brief description of the site's offerings.

  8. Newsroom press releases lack structured data Medium

    The newsroom page contains press releases and media mentions from NPR, eSchool News, and District Administration, but none of this third-party credibility is surfaced through structured data like NewsArticle or Mention schema.

    What to change: Add NewsArticle schema to press releases and Mention schema for media citations.

  9. Sitemap has no index and only 80 URLs Medium

    The sitemap at capstonepub.com/sitemap.xml contains only 80 URLs and is not an index sitemap. For a site with thousands of products, this likely omits many pages from crawling consideration.

    What to change: Generate a comprehensive sitemap index that includes all product and category pages, and ensure it is submitted to search engines.

  10. Web searches for Capstone products return zero results Low

    Searches for 'Capstone Publishing children's books PebbleGo reviews' and 'capstonepub.com PebbleGo literacy' returned zero results, indicating low external visibility and citation of the site's content.

    What to change: Improve SEO and content marketing to increase the site's presence in search results and AI training data.

What's working

  • Anthropic API crawler (anthropic-ai) is allowed and receives full content — The anthropic-ai crawler receives a 200 response with full page content, enabling Claude to index the site's text.
  • Domain verification tokens registered with OpenAI, Anthropic, and Apple — DNS records show verification tokens for major AI companies, indicating proactive engagement with AI ecosystems.
  • FAQ page contains rich Q&A content — The FAQ page at /support/faq has 1296 words of detailed Q&A content that could be surfaced via FAQPage schema.
  • Imprint pages include compelling sales figures — Pages like /about-us/our-imprints mention specific sales numbers (e.g., 'more than 2.5 million copies sold worldwide' for You Choose), which are strong credibility signals if structured.
  • Newsroom page features third-party media mentions — The newsroom includes press mentions from NPR, eSchool News, and District Administration, providing external credibility signals.
  • PebbleGo flagship product prominently featured on homepage — The homepage highlights PebbleGo as 'used in 25% of U.S. elementary schools by over 3 million kids', a strong brand signal.

Track capstonepub.com across AI search

This is one snapshot. Open the interactive report to inspect evidence, or grade another site free.

Open this AI Site Grade Grade another site Track your brand