AI Site Grade

polsinelli.com — AI Site Grade

Polsinelli's entire website is a JS shell invisible to most major AI crawlers, despite completing OpenAI domain verification.

Polsinelli's React SPA returns zero visible text to AI crawlers, Cloudflare blocks GPTBot, ClaudeBot, and ChatGPT-User, and the site lacks all structured data, making it invisible to AI training pipelines.

Findings
12
Evidence checks
44
Completed
30 May 2026

Analysis

Polsinelli's entire website is a JS shell invisible to most major AI crawlers

The site's Cloudflare WAF blocks GPTBot, Google-Extended, ClaudeBot, ChatGPT-User, OAI-SearchBot, Perplexity-User, and anthropic-ai with 403 "Just a moment..." challenge pages, while the homepage returns zero words of visible text to a plain GET request — a pure React SPA shell.

Crawler Access

The robots.txt at https://www.polsinelli.com/robots.txt contains a single catch-all rule (User-agent: *) that only disallows /?s=* and sets a crawl-delay of 5 seconds. No AI-specific user-agents are mentioned at all. Despite this permissive robots.txt, Cloudflare's bot detection independently blocks the majority of AI crawlers. Bytespider is the only AI crawler observed to receive a 200 response with the full 194KB JS-rendered payload. The llms.txt endpoint returns an SSL error — it does not exist. The sitemap at https://www.polsinelli.com/sitemap.xml is well-formed and contains 1,300+ URLs, but the sitemap index at https://www.polsinelli.com/sitemap_index.xml points only to that single sitemap.

Content & Schema

Every page fetched — homepage, /about-us, /capabilities, /news, /publications, /people, /offices-locations, /careers, /blogs-podcasts — returned zero words of visible text and zero JSON-LD schema objects. The <title> tag on the homepage reads "Polsinelli Law Firm" (when fetched via Bytespider) but the plain GET returns only "Polsinelli". No og:description, no meta description, no canonical tags, no heading structure, no FAQ or comparison signals exist in the raw HTML. The site is a single-page application that renders content client-side, meaning search engine crawlers and AI bots that cannot execute JavaScript see nothing.

Cold-Knowledge Gap

The LLM model queried about "Polsinelli law firm" returned detailed knowledge: founded 1972 in Kansas City, 900+ attorneys across 20+ offices, top-ranked health law and real estate practices, Chambers and Best Lawyers recognition, and a strong diversity reputation. This prior knowledge is entirely disconnected from what the site actually serves. The site's own homepage does not communicate any of these facts in machine-readable form. The model's knowledge appears to come from third-party legal directories and news coverage, not from the firm's own digital presence.

External Signals

The DNS records reveal an openai-domain-verification TXT record (dv-J6Z7uiAT252QvcSYrx0pHzKW), indicating Polsinelli has actively registered with OpenAI for some form of verification or crawling access — yet ChatGPT-User and OAI-SearchBot are both blocked by Cloudflare. Multiple google-site-verification records exist, confirming Google Search Console setup. The site is hosted on Cloudflare with UltraDNS nameservers and Mimecast for email security. No external search results were retrievable via the search tool, but the Wayback Machine shows 1,026 captures since October 2000, confirming long-standing web presence.

Surprising Contradictions

The most striking finding: Polsinelli has completed the OpenAI domain verification process (the TXT record proves this) yet still blocks ChatGPT-User and OAI-SearchBot at the Cloudflare edge. This suggests either a misconfiguration between the Cloudflare WAF rules and the OpenAI verification, or that the verification was done for a different purpose (e.g., ChatGPT plugin access) without updating bot policies. Additionally, the sitemap's <lastmod> dates extend into May 2026, indicating either aggressive future-dating or a CMS configuration issue. The site's React SPA architecture means even the bots that get past Cloudflare (Bytespider) receive a page that requires JavaScript execution to extract meaningful content — a significant barrier for AI training pipelines.

Findings

  1. Entire site is a client-side React SPA with zero visible text in raw HTML High

    Every page fetched (homepage, /about-us, /capabilities, /news, /publications, /people, /offices-locations, /careers, /blogs-podcasts) returned zero words of visible text and zero JSON-LD schema objects in the raw HTML. The site requires JavaScript execution to render content, which most AI crawlers cannot do.

    What to change: Implement server-side rendering (SSR) or static site generation (SSG) to deliver meaningful HTML content to crawlers. Ensure all pages include visible text and structured data in the initial server response.

  2. Cloudflare WAF blocks GPTBot, ClaudeBot, ChatGPT-User, and other major AI crawlers with 403 High

    GPTBot, Google-Extended, ClaudeBot, ChatGPT-User, OAI-SearchBot, Perplexity-User, and anthropic-ai all receive 403 'Just a moment...' challenge pages from Cloudflare. Only Bytespider receives a 200 response. This prevents AI training pipelines from accessing the site.

    What to change: Update Cloudflare WAF rules to allow known AI crawler user agents (GPTBot, ClaudeBot, ChatGPT-User, etc.) to access the site without challenge. Alternatively, use Cloudflare's bot management to allow verified bots.

  3. OpenAI domain verification completed but ChatGPT-User and OAI-SearchBot still blocked High

    DNS records include an openai-domain-verification TXT record, indicating Polsinelli actively registered with OpenAI. However, ChatGPT-User and OAI-SearchBot are both blocked by Cloudflare, suggesting a misconfiguration between the verification and bot access policies.

    What to change: Review Cloudflare WAF rules to ensure that verified OpenAI bots (ChatGPT-User, OAI-SearchBot) are allowed through. Coordinate with OpenAI's verification process to align bot access.

  4. Zero JSON-LD or structured data on any page High

    No JSON-LD schema objects were found on any fetched page. The site lacks Organization, LocalBusiness, WebSite, FAQ, or any other structured data that would help AI models understand the firm's services, locations, and expertise.

    What to change: Add JSON-LD structured data to all pages, including Organization schema with name, description, founding date, number of employees, and practice areas. Add LocalBusiness schema for each office location.

  5. Missing meta descriptions, og:description, and canonical tags Medium

    The homepage and other pages lack meta description tags, Open Graph descriptions, and canonical URLs. The <title> tag varies between 'Polsinelli' and 'Polsinelli Law Firm' depending on fetch method. This reduces search snippet quality and AI understanding.

    What to change: Add unique meta descriptions and og:description tags to every page. Ensure consistent <title> tags across all fetch methods. Add canonical tags to prevent duplicate content issues.

  6. llms.txt endpoint returns SSL error Medium

    The llms.txt endpoint at polsinelli.com/llms.txt returns an SSL certificate error, indicating the file does not exist. This file is a standard way to provide AI crawlers with a summary of the site's content and structure.

    What to change: Create an llms.txt file at the root of the domain that provides a plain-text summary of the site's content, key pages, and structured data for AI crawlers.

  7. Sitemap contains lastmod dates extending to May 2026 Low

    The sitemap's <lastmod> dates extend into May 2026, which is in the future. This may indicate a CMS configuration issue or aggressive future-dating, which can confuse crawlers about content freshness.

    What to change: Ensure sitemap <lastmod> dates reflect actual last modification dates and do not exceed the current date. Review CMS configuration for date handling.

  8. Robots.txt does not mention any AI-specific user agents Medium

    The robots.txt file contains only a catch-all rule that disallows /?s=* and sets a crawl-delay of 5 seconds. No AI-specific user agents (GPTBot, ClaudeBot, etc.) are mentioned, leaving their access entirely to Cloudflare's discretion.

    What to change: Add explicit rules for AI crawler user agents in robots.txt, allowing them access to relevant sections of the site. For example: 'User-agent: GPTBot Allow: /'.

  9. LLM prior knowledge about Polsinelli is not reflected on the site Medium

    An LLM queried about 'Polsinelli law firm' returned detailed knowledge (founded 1972, 900+ attorneys, top health law practice, Chambers recognition) that is entirely absent from the site's machine-readable content. The site does not communicate these facts in any structured or textual form.

    What to change: Ensure the homepage and about page include key facts (founding year, number of attorneys, practice areas, awards) in visible text and structured data. This helps AI models associate the site with the firm's reputation.

  10. No heading structure (h1-h6) in raw HTML Medium

    The raw HTML of all fetched pages contains no heading tags (h1, h2, etc.). This makes it difficult for crawlers to understand page hierarchy and content importance.

    What to change: Add proper heading structure (h1 for page title, h2 for sections, etc.) to all pages in the server-rendered HTML.

  11. Bytespider is the only AI crawler receiving 200 responses High

    Among tested AI crawlers, only Bytespider (ByteDance's crawler) receives a 200 response with the full JS payload. All other major AI crawlers are blocked. This creates an uneven access pattern where only one AI company can index the site.

    What to change: Review Cloudflare WAF rules to ensure all major AI crawlers are treated equally. Consider allowing GPTBot, ClaudeBot, and others similar to Bytespider.

  12. No search results found for site:polsinelli.com queries Medium

    Multiple web searches for site:polsinelli.com and related queries returned zero results. This may indicate poor indexing or that the site's content is not being crawled effectively by search engines.

    What to change: Ensure the site is properly indexed by search engines by fixing the JS rendering issue and submitting the sitemap to Google Search Console. Monitor indexing status.

What's working

  • Robots.txt is permissive with no AI-specific blocks — The robots.txt file only disallows /?s=* and sets a crawl-delay, without blocking any AI crawlers explicitly. This provides a foundation for allowing AI access once Cloudflare rules are adjusted.
  • Sitemap is well-formed with 1,300+ URLs — The sitemap at /sitemap.xml is well-formed and contains over 1,300 URLs, providing a comprehensive list of pages for crawlers to discover.
  • OpenAI domain verification TXT record present — The DNS includes an openai-domain-verification TXT record, indicating Polsinelli has taken steps to verify ownership with OpenAI, which is a prerequisite for potential ChatGPT integration or crawling access.
  • Multiple Google site verification records present — DNS records contain multiple google-site-verification TXT records, confirming that Polsinelli has set up Google Search Console for monitoring and managing search presence.
  • Wayback Machine shows 1,026 captures since October 2000 — The site has been archived by the Wayback Machine over 1,000 times since 2000, indicating a long-standing and historically active web presence.
  • Bytespider receives full 194KB JS payload — Bytespider, an AI crawler, successfully receives the full JavaScript-rendered payload (194KB) with a 200 status, indicating that the site's content is technically accessible to at least one AI crawler.
  • Cloudflare provides robust security and DDoS protection — The site uses Cloudflare for security, including bot detection and DDoS mitigation, which protects against malicious traffic. This is a positive security posture, though it currently overblocks AI crawlers.
  • DNS records include email security and verification records — The DNS configuration includes Mimecast for email security, multiple verification records (OpenAI, Google), and UltraDNS nameservers, indicating a professionally managed infrastructure.

Track polsinelli.com across AI search

This is one snapshot. Open the interactive report to inspect evidence, or grade another site free.

Open this AI Site Grade Grade another site Track your brand