AI Site Grade
whatnot.com — AI Site Grade
Whatnot's entire web domain is invisible to every major AI crawler due to Cloudflare blocks, despite permissive robots.txt and domain verification tokens.
Whatnot's $3.7B marketplace has zero AI-crawler-accessible content on its primary domain because Cloudflare blocks all AI bots, while sitemaps are hidden and structured data is absent.
- Findings
- 12
- Evidence checks
- 28
- Completed
- 30 May 2026
Analysis
Whatnot's entire web domain is invisible to every major AI crawler
Every AI crawler — GPTBot, ClaudeBot, PerplexityBot, Google-Extended, OAI-SearchBot, ChatGPT-User, Applebot-Extended, Bytespider, anthropic-ai, Perplexity-User — receives a Cloudflare 403 block on www.whatnot.com, the homepage, the /seller page, and every listing page. The robots.txt allows all AI bots to crawl / (no AI-specific rules exist), but Cloudflare's WAF overrides that permission at the edge. The result: a $3.7B marketplace with 75+ sitemap files pointing to millions of URLs has zero AI-crawler-accessible content on its primary domain.
Crawler Access
The robots.txt at www.whatnot.com/robots.txt uses a single User-agent: * rule with Allow: / and no AI-bot-specific directives. No GPTBot, ClaudeBot, Google-Extended, or any other AI crawler is mentioned. Despite this permissive stance, compare_bot_access on both the homepage and the /seller page returned 403 for all 10 AI user-agents tested. The only user-agent that gets a 200 is a standard browser. The block is Cloudflare-level (the response body is a "Just a moment..." challenge page), not a server-side application block. The /llms.txt returns a 404 (serving the Next.js app shell). The standard /sitemap.xml also 404s — sitemaps are served from non-standard .txt paths (/listings-sitemap-index.txt, /seller-sitemap.txt, /users-sitemap-index.txt), which standard crawler discovery won't find.
Content & Schema
The homepage is a JavaScript shell — a plain GET extracts 0 words of visible text. The /seller page, by contrast, contains ~900 words of substantive content (seller fees, shipping details, FAQ with 5 questions), but it is only accessible to browsers. The /about-us page is thin (~94 words). No JSON-LD structured data exists anywhere on the homepage, /seller, or /about-us. The blog at blog.teamwhatnot.com (a separate Squarespace subdomain) does have WebSite and LocalBusiness schema, and is fully accessible to all AI crawlers except Bytespider — but the blog is not the core marketplace.
Cold-Knowledge Gap
The LLM cold-knowledge query returned a detailed, accurate description: Whatnot is a live-streaming marketplace founded in 2019, valued at $3.7B, focused on collectibles and trading cards, with investors including a16z and Y Combinator. It also surfaced reputational signals about counterfeit goods scrutiny and 2023 layoffs. The cold model knows more about the brand than any AI crawler could learn from the live site today. The gap is not that the model is wrong — it's that the model's knowledge comes from pre-training data (news articles, press coverage), not from the site's own content. The site's actual value proposition ("the largest live shopping app in North America and Europe," $6B+ in 2025 sales, Shopify integration, 95 minutes daily user engagement) is entirely absent from what a live-retrieval AI would see.
External Signals
The blog at blog.teamwhatnot.com (Squarespace-hosted) is the only AI-accessible content on the Whatnot ecosystem. It contains rich product announcements: a Shopify integration, a $225M Series F raise, a 2026 State of Live Selling report, and a Seller Summit event. The DNS TXT records reveal an openai-domain-verification token and an anthropic-domain-verification token, confirming Whatnot has proactively registered with both OpenAI and Anthropic for API/enterprise access — yet neither company's crawler can read the site. The help center at help.whatnot.com/hc/ is also Cloudflare-blocked for all bots except ClaudeBot (which gets a 200 there, oddly).
Findings
Cloudflare blocks all AI crawlers across the entire domain High
Every major AI crawler (GPTBot, ClaudeBot, PerplexityBot, Google-Extended, etc.) receives a 403 Cloudflare challenge on www.whatnot.com, the homepage, /seller, and listing pages. Only a standard browser gets a 200 response.
What to change: Configure Cloudflare WAF to allow AI crawlers (GPTBot, ClaudeBot, etc.) to access the site, or serve static HTML versions to bots.
robots.txt lacks AI-specific directives Medium
The robots.txt at www.whatnot.com/robots.txt uses a single User-agent: * rule with Allow: / and does not mention any AI crawler by name. This permissive stance is overridden by Cloudflare, but the absence of explicit AI rules means crawlers have no guidance.
What to change: Add explicit Allow or Disallow rules for AI crawlers (GPTBot, ClaudeBot, etc.) to align with the intended access policy.
llms.txt file returns 404 Medium
The /llms.txt endpoint returns a 404 (serving the Next.js app shell), so AI crawlers cannot discover a curated summary of the site's content.
What to change: Create an llms.txt file with a plain-text overview of the marketplace, key pages, and sitemap locations.
Standard sitemap.xml returns 404 High
The standard /sitemap.xml returns a 404, preventing crawlers from discovering the site's sitemaps. Sitemaps are served from non-standard .txt paths (e.g., /listings-sitemap-index.txt) that standard crawler discovery won't find.
What to change: Serve a standard sitemap.xml that references the existing .txt sitemap indexes, or add a robots.txt Sitemap directive pointing to the index files.
Homepage renders as a JavaScript shell with zero visible text High
A plain GET of the homepage extracts 0 words of visible text, meaning AI crawlers that cannot execute JavaScript see an empty page.
What to change: Implement server-side rendering (SSR) or static prerendering for the homepage so that crawlers receive meaningful HTML content.
No JSON-LD structured data on key pages High
The homepage, /seller, and /about-us pages contain no JSON-LD structured data. This prevents AI crawlers from understanding the site's entity relationships (e.g., marketplace, seller, product).
What to change: Add JSON-LD structured data (e.g., WebSite, Organization, Product, FAQPage) to all key pages.
Seller page with substantive content is blocked for AI crawlers High
The /seller page contains ~900 words of substantive content (seller fees, shipping details, FAQ), but it is only accessible to browsers. AI crawlers receive a 403.
What to change: Allow AI crawlers to access the /seller page by adjusting Cloudflare WAF rules.
Listing pages blocked for GPTBot despite browser accessibility High
A sample listing page returns 403 for GPTBot but 200 for a standard browser, meaning product listings are invisible to AI crawlers.
What to change: Allow AI crawlers to access listing pages by adjusting Cloudflare WAF rules.
Domain verification tokens exist but crawlers are still blocked Medium
DNS TXT records include openai-domain-verification and anthropic-domain-verification tokens, indicating proactive registration with OpenAI and Anthropic. However, both companies' crawlers are blocked by Cloudflare, rendering the tokens useless.
What to change: Ensure that Cloudflare WAF allows GPTBot and ClaudeBot to access the site, matching the domain verification intent.
LLM cold knowledge is richer than live-retrievable content Medium
The LLM's pre-training knowledge includes accurate details about Whatnot (founded 2019, $3.7B valuation, investors, etc.), but the live site provides none of its current value proposition (e.g., $6B+ sales, Shopify integration) to AI crawlers. This gap means retrieval-augmented generation (RAG) systems cannot access up-to-date information.
What to change: Make key value propositions and current data accessible to AI crawlers by unblocking pages and adding structured data.
Help center is blocked for most AI crawlers Medium
The help center at help.whatnot.com/hc/ returns 403 for all AI crawlers except ClaudeBot (which gets a 200). This limits AI access to support content.
What to change: Allow all AI crawlers to access the help center by adjusting Cloudflare WAF rules.
About Us page has thin content Low
The /about-us page contains only ~94 words, providing minimal information about the company. This limits the page's value for AI crawlers seeking context.
What to change: Expand the About Us page with more detailed company information, history, and mission.
What's working
- Blog subdomain is accessible to most AI crawlers — The blog at blog.teamwhatnot.com (Squarespace-hosted) is accessible to 10 out of 11 AI crawlers tested, providing rich content about product announcements and company news.
- Blog includes WebSite and LocalBusiness schema — The blog page contains JSON-LD structured data (WebSite and LocalBusiness), which helps AI crawlers understand the site's entity relationships.
- Domain verification tokens registered with OpenAI and Anthropic — DNS TXT records include openai-domain-verification and anthropic-domain-verification tokens, indicating proactive steps to enable AI access.
- Seller page contains substantive content when accessible — The /seller page has ~900 words of useful content (fees, shipping, FAQ) that would be valuable to AI crawlers if unblocked.
- Sitemap files exist at non-standard paths — Multiple sitemap index files (listings, seller, users) are served as .txt files, indicating the site has structured URL organization that could be leveraged.
- robots.txt is permissive for all crawlers — The robots.txt allows all crawlers to access the entire site with Allow: /, showing an intent to be crawled.
- Cloudflare provides robust security and performance — The site uses Cloudflare for CDN and WAF, which protects against DDoS and abuse, though it currently blocks AI crawlers.
- Help center accessible to ClaudeBot — The help center at help.whatnot.com/hc/ is accessible to ClaudeBot (200), indicating partial AI access.
Track whatnot.com across AI search
This is one snapshot. Open the interactive report to inspect evidence, or grade another site free.