AI Site Grade
premiersothebysrealty.com — AI Site Grade
Premier Sotheby's selectively blocks OAI-SearchBot and ChatGPT-User at the WAF while allowing GPTBot, creating a confusing OpenAI access policy.
The site's WAF blocks OAI-SearchBot and ChatGPT-User with 403 errors despite permissive robots.txt, while GPTBot and ClaudeBot get full content; schema is limited to a single Corporation type, and agent/office pages return 429 errors.
- Findings
- 9
- Evidence checks
- 24
- Completed
- 30 May 2026
Analysis
OAI-SearchBot and ChatGPT-User are blocked at the edge while GPTBot gets full content — a selective OpenAI blockade
The site's robots.txt explicitly names GPTBot and OAI-SearchBot with only a crawl-delay: 10 (no disallow), yet compare_bot_access shows OAI-SearchBot and ChatGPT-User both receive HTTP 403 ("Request Rejected") from the WAF/edge layer, while GPTBot gets a full 200 with 162KB of content identical to browser baseline. This is not a robots.txt block — it is a UA-based WAF rule that selectively blocks OpenAI's search and chat-user crawlers while allowing GPTBot (the training crawler) through. The same pattern applies to anthropic-ai (403) while ClaudeBot (200) passes freely.
Crawler Access
The site runs on Cloudflare DNS (archer.ns.cloudflare.com / katelyn.ns.cloudflare.com) with a Burrow Services ASP.NET backend. The WAF blocks Bytespider, OAI-SearchBot, ChatGPT-User, and anthropic-ai at the edge with HTTP 403. GPTBot, ClaudeBot, Google-Extended, PerplexityBot, Perplexity-User, and Applebot-Extended all receive the full 162KB page. The robots.txt is permissive for AI bots (only crawl-delay rules, no disallows) but the WAF overrides this for specific UAs. No llms.txt exists (404). The sitemap index is healthy with 30+ sub-sitemaps covering offices, agents, and featured listings.
Schema Posture
Every page examined uses a single, identical Corporation schema with the same name, url, and sameAs array. There is no RealEstateAgent, no LocalBusiness, no Product for listings, no FAQPage, and no BreadcrumbList schema anywhere. Agent profile pages and office pages — which would naturally carry Person and RealEstateAgent schema — return HTTP 429 (rate-limited) to a browser UA, meaning AI crawlers hitting those pages likely see a "Please Verify You Are Human" wall or thin JS shell. Property listing pages lack SingleFamilyResidence, Offer, or Place schema entirely.
Cold-Knowledge Gap
The LLM knows Premier Sotheby's as "one of the largest affiliates in the Sotheby's International Realty network, with over 30 offices" serving Florida and the Southeast. The site itself claims 40 locations and 1,100+ associates — a material gap the LLM understates. The LLM also mentions a "luxury magazine" and "global listing platform via sothebysrealty.com" as key offerings, but the site's own navigation emphasizes Elevated Services (brokerage, mortgage, title insurance) and new developments — services the LLM does not mention at all. The site's copyright reads 2026, suggesting either a forward-dated template or a two-year-ahead placeholder.
External Signals
External search results for the brand return zero indexed results across multiple queries — no Reddit threads, no review sites, no press mentions surfaced. This is unusual for a brokerage of this scale and may indicate low off-domain citation density, which limits the corpus AI models draw from when building brand descriptions. The site's own sameAs links point to active social profiles (Facebook, Instagram, LinkedIn, YouTube, Pinterest, Twitter), but these are not being picked up in search snippets.
Findings
OAI-SearchBot and ChatGPT-User blocked at WAF while GPTBot allowed High
The site's WAF returns HTTP 403 for OAI-SearchBot and ChatGPT-User, but GPTBot receives full 200 content. This selective blocking contradicts the permissive robots.txt and limits AI search visibility.
What to change: Remove the WAF rules blocking OAI-SearchBot and ChatGPT-User, or align them with the robots.txt policy to allow these crawlers.
Anthropic-ai crawler blocked at WAF while ClaudeBot allowed High
The anthropic-ai user agent receives HTTP 403, but ClaudeBot gets full content. This inconsistent access limits visibility for Anthropic's AI products.
What to change: Remove the WAF rule blocking anthropic-ai to match the permissive robots.txt.
No llms.txt file published Medium
The site returns 404 for llms.txt, missing an opportunity to guide AI crawlers to key content.
What to change: Publish an llms.txt file listing important pages like agent directories, office pages, and property search.
Agent and office pages return HTTP 429 to browsers High
Agent profile and office pages return 429 with a 'Please Verify You Are Human' message, making them inaccessible to AI crawlers and potentially blocking content.
What to change: Remove rate limiting for agent and office pages, or whitelist known AI crawler IP ranges.
No RealEstateAgent or LocalBusiness schema on any page High
All pages use only a generic Corporation schema. Agent and office pages lack RealEstateAgent, Person, or LocalBusiness schema, reducing structured data relevance for AI.
What to change: Add RealEstateAgent, LocalBusiness, and Person schema to relevant pages, and SingleFamilyResidence/Offer schema to property listings.
Missing FAQPage and BreadcrumbList schema Medium
No FAQPage or BreadcrumbList schema found on any page, missing opportunities for rich snippets and AI context.
What to change: Add BreadcrumbList schema to all pages and FAQPage schema to pages with Q&A content.
LLM understates office count and omits services Medium
The LLM knows 30+ offices, but the site claims 40 locations and 1,100+ associates. Services like mortgage and title insurance are not mentioned by the LLM.
What to change: Ensure the site's content and structured data clearly state the number of offices and associates, and highlight all services on key pages.
No external search results found for the brand Medium
Web searches for the brand and related terms returned zero results, indicating low off-domain citation density.
What to change: Increase off-domain citations through PR, guest posts, and listings on review sites to improve AI corpus representation.
Copyright year set to 2026 Low
The site's copyright reads 2026, which may appear outdated or incorrect to AI crawlers.
What to change: Update the copyright year to the current year dynamically.
What's working
- Robots.txt allows all AI crawlers with only crawl-delay — The robots.txt file does not disallow any AI bots, setting a permissive baseline for crawling.
- Sitemap index with 30+ sub-sitemaps covering key content — The sitemap index is healthy and includes sub-sitemaps for offices, agents, and featured listings, aiding discovery.
- GPTBot and ClaudeBot receive full 200 content — Training crawlers like GPTBot and ClaudeBot get the full page content, enabling AI models to learn from the site.
- SameAs links point to active social media profiles — The site includes sameAs links to Facebook, Instagram, LinkedIn, YouTube, Pinterest, and Twitter, supporting brand authority.
- Region pages contain substantial text content — Pages like the Naples region page have over 1,600 words of descriptive content, providing rich material for AI indexing.
Track premiersothebysrealty.com across AI search
This is one snapshot. Open the interactive report to inspect evidence, or grade another site free.