AI Site Grade
hanoverresearch.com — AI Site Grade
Hanover Research selectively blocks GPTBot, Google-Extended, and ClaudeBot while allowing ChatGPT-User and PerplexityBot, creating a split AI crawler reality.
Hanover Research's server-level bot filtering blocks most major AI training crawlers but permits retrieval bots, while the site lacks Organization schema, has a broken blog link, and omits key brand signals from its content.
- Findings
- 10
- Evidence checks
- 27
- Completed
- 30 May 2026
Analysis
Hanover Research: AI crawlers see a split reality — most major AI bots are silently disconnected while a select few get full content
Crawler Access
The site runs on nginx (Google Cloud IP 35.212.105.76) with strong security headers (HSTS, CSP, X-Frame-Options) but no Cloudflare or explicit WAF branding. The robots.txt is minimal — only blocks /wp-admin/ and /style-guide/ for User-agent: * and allows Scrapy broadly — yet the server itself enforces UA-based blocking at the connection level. GPTBot, Google-Extended, ClaudeBot, anthropic-ai, Applebot-Extended, and Bytespider all receive a RemoteProtocolError: Server disconnected without sending a response (status 0, zero bytes). Meanwhile, ChatGPT-User, OAI-SearchBot, PerplexityBot, and Perplexity-User get the full 598KB homepage with 200 status — identical to a browser. This is a server-side WAF or nginx rule that silently drops connections from certain bot UAs, not a robots.txt block. The llms.txt returns a 404 (WordPress 404 page, 298KB of HTML/CSS/JS). No llms.txt exists.
Content & Schema Posture
The homepage and all sub-pages (corporate, higher-ed, K-12) are content-rich WordPress pages with 1,000-1,800 words of visible text, no JS-rendering risk. JSON-LD schema is present via Yoast SEO but limited to WebPage, WebSite, BreadcrumbList, ImageObject, and Article types. No Organization schema with logo, social profiles, or founding date exists on any page fetched. The report page (the-state-of-market-research-2025) uses FAQPage schema with actual Q&A pairs — a strong answer-format signal. However, the main solution pages (corporate, higher-ed, K-12) have no FAQ, no comparison tables, no structured data beyond breadcrumbs. The /blog/ URL returns a 404 — the blog lives under /research-insights/ instead, but the navigation still links to /blog/.
Cold-Knowledge Gap
The LLM knows Hanover Research as a 2003-founded, Arlington-based firm with a "Research as a Service" subscription model, serving K-12, higher ed, healthcare, and corporate clients with ~1,000 employees. The model also recalls a 2023 discrimination lawsuit as a reputational signal. The actual site never mentions the lawsuit, never states "Research as a Service" explicitly, and does not prominently feature the 2003 founding date on the homepage (it appears buried in the About page's "20 fun facts" timeline). The site positions itself around three verticals (Corporate, Higher Ed, K-12) — healthcare is absent from the site's navigation despite the model associating it with the brand. The site claims "96% client satisfaction," "$1.4 billion in grant funding secured," and "1+ million surveys deployed annually" — none of these headline metrics appear in the model's cold knowledge.
External Signals
The news page shows active press mentions in Inside Higher Ed (April 2026), HRD, and local Michigan outlets — indicating ongoing media pickup. No Reddit threads, G2 reviews, or Clutch ratings surfaced in searches. The DNS TXT records reveal an OpenAI domain verification token (openai-domain-verification=dv-aYg3Gc4WRxkL5mgGTDDwEL2B), confirming the site has proactively verified with OpenAI for ChatGPT retrieval — consistent with ChatGPT-User and OAI-SearchBot being allowed while GPTBot is blocked. The site uses Salesforce, Qualtrics, HubSpot, Mailgun, and Intacct — a mature MarTech stack.
Surprising Signals
The most striking finding is the selective bot blockade: GPTBot is blocked but ChatGPT-User (the bot that powers ChatGPT's live browsing) is allowed, and the site has an OpenAI domain verification token. This suggests a deliberate strategy to allow OpenAI's retrieval API while blocking GPTBot's training crawler. However, Google-Extended is also blocked — meaning Google's AI training crawler cannot access the site, while Google's regular search crawler (Googlebot) presumably can. ClaudeBot and anthropic-ai are both blocked, meaning no Anthropic model can access the site at all. The /about/ redirects to /about-us/ — a minor canonical issue. The dateModified on the homepage is set to 2026-03-31 — a future date that may confuse freshness signals.
Findings
Server-level bot filtering blocks GPTBot, Google-Extended, ClaudeBot, and others High
The nginx server silently disconnects GPTBot, Google-Extended, ClaudeBot, anthropic-ai, Applebot-Extended, and Bytespider with a RemoteProtocolError, while allowing ChatGPT-User, OAI-SearchBot, PerplexityBot, and Perplexity-User. This is not a robots.txt block but a server-side WAF or nginx rule.
What to change: Remove the server-side UA-based blocking for GPTBot, Google-Extended, and ClaudeBot, or replace it with a robots.txt-based approach that clearly communicates access policy.
No llms.txt file published Medium
The llms.txt endpoint returns a 404 WordPress page, meaning no machine-readable guidance for AI crawlers exists.
What to change: Publish an llms.txt file that lists key pages and provides a brief summary of the site's content for AI crawlers.
No Organization schema on any page Medium
JSON-LD schema via Yoast SEO includes WebPage, WebSite, BreadcrumbList, ImageObject, and Article types, but no Organization schema with logo, social profiles, or founding date.
What to change: Add Organization schema to the homepage and About page, including name, logo, founding date, social media URLs, and description.
Navigation links to /blog/ return 404 Medium
The /blog/ URL returns a 404 page, but the site's navigation still links to it. The actual blog content is at /research-insights/.
What to change: Update the navigation to point to /research-insights/ and set up a 301 redirect from /blog/ to /research-insights/.
Homepage dateModified set to future date 2026-03-31 Low
The homepage's dateModified in schema is set to 2026-03-31, which may confuse crawlers about content freshness.
What to change: Update the dateModified to the actual last modification date or remove it if not maintained.
Healthcare vertical absent from site navigation despite LLM association Medium
The site's navigation only lists Corporate, Higher Education, and K-12 solutions, but the LLM associates healthcare as a key vertical. No healthcare page was found in the sitemap.
What to change: Add a healthcare solutions page or section to the site to align with the brand's actual service offerings.
Headline metrics (96% satisfaction, $1.4B grants) not in LLM knowledge Medium
The site prominently claims '96% client satisfaction', '$1.4 billion in grant funding secured', and '1+ million surveys deployed annually', but these metrics are absent from the LLM's cold knowledge.
What to change: Ensure these metrics are included in structured data (e.g., Organization schema) and in the llms.txt summary to improve AI visibility.
No reviews on G2, Clutch, or Reddit Low
Web searches for Hanover Research on G2, Clutch, and Reddit returned zero results, limiting external social proof signals for AI models.
What to change: Encourage clients to leave reviews on G2 and Clutch, and consider engaging in relevant Reddit communities.
/about/ redirects to /about-us/ Low
The /about/ URL redirects to /about-us/, which is a minor canonical issue but could be streamlined.
What to change: Use a single canonical URL for the About page and ensure internal links point to the canonical version.
Solution pages lack FAQ schema and comparison tables Low
The corporate, higher-ed, and K-12 solution pages have no FAQ or comparison table structured data, missing opportunities for rich results.
What to change: Add FAQPage schema and comparison tables to solution pages to improve search visibility and answer-format signals.
What's working
- OpenAI domain verification token present — The DNS TXT records include an OpenAI domain verification token, confirming proactive verification for ChatGPT retrieval.
- All key pages are content-rich WordPress pages with 1,000-1,800 words — The homepage and solution pages have substantial visible text content, no JS-rendering risk, and are fully accessible to allowed bots.
- FAQPage schema on the State of Market Research 2025 report — The report page uses FAQPage schema with actual Q&A pairs, providing strong answer-format signals for search engines and AI.
- Active press mentions in Inside Higher Ed and other outlets — The news page shows recent press coverage, indicating ongoing media pickup and external validation.
- Mature MarTech stack including Salesforce, Qualtrics, HubSpot — DNS records indicate use of Salesforce, Qualtrics, HubSpot, Mailgun, and Intacct, reflecting a sophisticated marketing technology infrastructure.
- Strong security headers (HSTS, CSP, X-Frame-Options) in place — The site returns HSTS, CSP, and X-Frame-Options headers, providing robust security and trust signals.
- Sitemap with 80 URLs and index present — The sitemap is accessible and contains 80 URLs with an index, aiding crawler discovery.
- Robots.txt is minimal and does not block AI bots — The robots.txt only blocks /wp-admin/ and /style-guide/ for all user agents, and allows Scrapy broadly, so it does not contribute to bot blocking.
Track hanoverresearch.com across AI search
This is one snapshot. Open the interactive report to inspect evidence, or grade another site free.