AI Site Grade

healthgrades.com — AI Site Grade

Healthgrades.com returns 403 to every AI crawler, blocking all content, schema, and sitemap access.

Healthgrades.com is entirely inaccessible to AI crawlers due to geo-blocking, preventing models from verifying or updating any physician or hospital data.

Findings
9
Evidence checks
46
Completed
30 May 2026

Analysis

Geo-Blocked Site Returns 403 to Every AI Crawler

Healthgrades.com serves a 403 "not available in your area" page to every AI crawler tested -- GPTBot, ClaudeBot, PerplexityBot, Google-Extended, ChatGPT-User, OAI-SearchBot, Applebot-Extended, Bytespider, and anthropic-ai all receive the identical geo-block wall. The robots.txt and llms.txt endpoints also return 403 HTML pages instead of their expected file formats, meaning no crawler can even read access rules. The site is hosted on AWS CloudFront (AmazonS3 origin) with no security headers (no HSTS, no CSP, no X-Frame-Options).

Cold-Knowledge Gap

LLMs describe Healthgrades as a "leading online resource" founded in 1998, headquartered in Denver, acquired by Red Ventures in 2019, with "over 30 million monthly unique visitors" and "more than 13 million patient ratings." The Wayback snapshot of the homepage confirms the tagline "Half of all Americans who see doctors each year use Healthgrades." The gap: the live site is entirely inaccessible to AI crawlers, so models cannot verify or update any of this information. The cold knowledge already mentions "criticism over the accuracy and potential bias of its physician ratings" -- a reputational signal the site itself cannot counterprogram to AI systems.

Schema and Content Posture

The Wayback snapshot of the homepage (December 2023) reveals zero JSON-LD schema of any type. No Physician, Hospital, LocalBusiness, Review, or AggregateRating structured data was present. The page uses a single H1 ("Find a Doctor") and multiple H2 headings for navigation sections. The site has FAQ, table, and list answer signals in its archived content, suggesting it could support rich results, but the absence of schema means AI engines cannot extract structured entity data. The sitemap.xml is also geo-blocked, so crawlers cannot discover the URL inventory.

External Signals and Fragmentation

The site's DNS records show 30+ verification tokens (Google, Facebook, Adobe, Atlassian, Wiz, Yandex, HubSpot) indicating a complex third-party integration stack. The SPF record references Mandrill, HubSpot, Google, and Outlook -- a fragmented email ecosystem. No external reviews, Reddit threads, or press articles were surfaced in search results during investigation, suggesting the site's off-domain footprint is thin relative to its claimed 30M monthly visitors. The Wayback archive shows the site has been captured 270+ times since 2004, but individual doctor profile pages are not archived, meaning AI training data likely contains no actual physician-level content from Healthgrades.

Findings

  1. All AI crawlers receive 403 geo-block wall High

    Every tested AI crawler (GPTBot, ClaudeBot, PerplexityBot, Google-Extended, ChatGPT-User, OAI-SearchBot, Applebot-Extended, Bytespider, anthropic-ai) receives a 403 'not available in your area' page. The site is entirely inaccessible to AI crawlers.

    What to change: Remove geo-blocking for AI crawler user agents or serve them a static, crawlable version of the site.

  2. robots.txt returns 403 HTML page High

    The robots.txt endpoint returns a 403 HTML page instead of the expected plain text file, preventing crawlers from reading access rules.

    What to change: Configure the server to serve robots.txt as a plain text file with appropriate access rules.

  3. llms.txt endpoint returns 403 High

    The llms.txt endpoint returns a 403 HTML page, so AI crawlers cannot discover AI-specific guidance.

    What to change: Create and serve a valid llms.txt file with AI-friendly instructions.

  4. Sitemap.xml returns 403 High

    The sitemap.xml endpoint is geo-blocked, so crawlers cannot discover the site's URL inventory.

    What to change: Serve sitemap.xml without geo-blocking to allow crawler discovery.

  5. Homepage lacks any JSON-LD structured data High

    The archived homepage (December 2023) contains zero JSON-LD schema for Physician, Hospital, LocalBusiness, Review, or AggregateRating. AI engines cannot extract structured entity data.

    What to change: Add JSON-LD structured data for physicians, hospitals, reviews, and aggregate ratings on relevant pages.

  6. Doctor profile pages are not archived in Wayback Machine Medium

    Individual doctor profile pages are not captured in the Wayback Machine, meaning AI training data likely contains no actual physician-level content from Healthgrades.

    What to change: Ensure doctor profile pages are publicly crawlable and archived to build AI visibility.

  7. No security headers on homepage Low

    The homepage lacks HSTS, CSP, and X-Frame-Options headers, which may affect trust signals for AI crawlers.

    What to change: Add HSTS, CSP, and X-Frame-Options headers to improve security posture.

  8. Thin off-domain footprint relative to claimed traffic Medium

    No external reviews, Reddit threads, or press articles were surfaced in search results, suggesting the site's off-domain footprint is thin relative to its claimed 30M monthly visitors.

    What to change: Build external backlinks and mentions through PR, partnerships, and content marketing.

  9. Complex third-party integration stack with fragmented email Low

    DNS records show 30+ verification tokens and SPF referencing Mandrill, HubSpot, Google, and Outlook, indicating a complex and potentially fragmented email ecosystem.

    What to change: Consolidate email sending services to simplify SPF and improve deliverability.

What's working

  • LLMs have strong cold knowledge of Healthgrades brand — LLMs describe Healthgrades as a leading healthcare platform with 30M monthly visitors and 13M patient ratings, indicating strong brand recognition.
  • Homepage uses clear heading hierarchy — The archived homepage uses a single H1 ('Find a Doctor') and multiple H2 headings for navigation, providing a clear content structure.
  • Archived content includes FAQ and list answer signals — The archived homepage contains FAQ, table, and list answer signals, which could support rich results if schema were added.
  • Site has extensive Wayback Machine archive history — The site has been captured 270+ times since 2004, providing a long historical record for AI training data.

Track healthgrades.com across AI search

This is one snapshot. Open the interactive report to inspect evidence, or grade another site free.

Open this AI Site Grade Grade another site Track your brand