AI Site Grade
fieldfisher.com — AI Site Grade
Fieldfisher's Cloudflare WAF blocks every AI crawler, and the robots.txt explicitly bans them all, making the site a black box to AI systems.
Fieldfisher's website is completely inaccessible to AI crawlers due to Cloudflare WAF and restrictive robots.txt, while the LLM's cold knowledge about the firm is richer than anything the live site reveals.
- Findings
- 12
- Evidence checks
- 52
- Completed
- 30 May 2026
Analysis
---
Fieldfisher's Cloudflare WAF blocks every AI crawler — and the robots.txt explicitly bans them all
The site's robots.txt (served at /robots.txt, 200) is a Cloudflare-managed file that explicitly disallows GPTBot, ClaudeBot, Google-Extended, Applebot-Extended, Bytespider, CCBot, Amazonbot, meta-externalagent, and CloudflareBrowserRenderingCrawler via Disallow: /. It also carries a Content-Signal: ai-train=no directive. The compare_bot_access test on the homepage confirmed every AI bot (and even a standard browser) receives a 403 Cloudflare JS challenge — no bot gets past the WAF to see any content. No llms.txt exists (connection reset). No sitemap is accessible (403). The site is effectively a black box to all AI crawlers.
Cold-Knowledge Gap
The LLM's cold knowledge about Fieldfisher is richer and more specific than anything the live site reveals. The model knows Fieldfisher as a European law firm strong in technology, media, and financial services; that it represents Google and Meta; that it has a "Concert" legal tech platform for GDPR compliance; that it has 26+ offices across Europe and Asia; and that it faced reputational scrutiny over internal culture and pay gaps in 2022. None of this information is retrievable from the live site — the homepage, services pages, sector pages, and about page all return 403. The only pages that render content are /en/careers (954 words, rich text about culture and recruitment) and /en/insights and /en/people (thin JS shells, ~30-60 words each). The site's own tagline — "A law firm built around people" — is visible only via Wayback Machine snapshots.
Schema Posture
The Organization schema present on accessible pages is minimal: name, URL, and a single ContactPoint with telephone +44 (0)20 7861 4000. No sameAs links, no description, no foundingDate, no numberOfEmployees, no areaServed, no knowsAbout for practice areas. The WebPage schema includes datePublished but no author or about fields. The careers page has no JobPosting schema despite listing multiple recruitment contacts and role categories. The insights page has no Article or BlogPosting schema for its content items.
External Signals
The site links to LinkedIn, Instagram, Twitter/X, and YouTube in its footer. The DNS TXT records reveal integrations with Mimecast (email security), DocuSign, LogMeIn, Atlassian, Canva, Miro, Pexip, and Exclaimer — a heavy SaaS stack. The site is hosted on Cloudflare with A records pointing to 4.250.248.98 and 51.104.28.70. No external press, reviews, or directory listings were discoverable via search during this audit.
Surprising Findings
The /en/careers page is the only substantive page (954 words) that renders without a JS challenge — it contains detailed employee testimonials, recruitment team contacts, and FAQ content. This is an odd priority: recruitment content is fully accessible while the firm's core practice area pages, sector expertise, and about page are all locked behind Cloudflare. The robots.txt Content-Signal mechanism is an unusual hybrid — it uses the emerging Content-Signal directive alongside traditional User-agent blocks, but the WAF enforcement makes the signals redundant since no bot reaches any content anyway. The datePublished in the schema on the insights page reads 2026-03-03 — a future date that suggests a CMS misconfiguration or placeholder timestamp.
Findings
Cloudflare WAF blocks every AI crawler and standard browser High
The homepage and all core pages (services, sectors, about, locations, contact) return a 403 Cloudflare JS challenge to every tested bot and even a standard browser. No AI crawler can access any content.
What to change: Remove the Cloudflare JS challenge for known AI crawler IP ranges or serve a static HTML version of the site to bots. Alternatively, allowlist GPTBot, ClaudeBot, and Google-Extended in the WAF.
Robots.txt explicitly disallows all major AI crawlers High
The robots.txt file at /robots.txt disallows GPTBot, ClaudeBot, Google-Extended, Applebot-Extended, Bytespider, CCBot, Amazonbot, meta-externalagent, and CloudflareBrowserRenderingCrawler via Disallow: /. It also includes a Content-Signal: ai-train=no directive.
What to change: Remove the Disallow: / directives for GPTBot, ClaudeBot, and Google-Extended to allow AI crawlers to index the site. Keep blocks for other bots if desired.
No llms.txt file available Medium
The llms.txt endpoint returns a connection reset error, meaning no llms.txt file is published. This file would help AI crawlers discover key pages and understand site structure.
What to change: Create an llms.txt file at /llms.txt listing the most important pages (e.g., /en/services, /en/sectors/technology, /en/insights) with brief descriptions.
Sitemap.xml returns 403 and is inaccessible High
The sitemap.xml file returns a 403 Cloudflare JS challenge, preventing crawlers from discovering the site's URL structure. No sitemap was retrievable.
What to change: Ensure sitemap.xml is served without a JS challenge, or place it at a path that bypasses the WAF for crawlers.
Core practice area and sector pages return 403 High
Pages describing services, sectors, locations, about, and contact all return 403 Cloudflare JS challenges. Only /en/careers, /en/insights, and /en/people return any content, with insights and people being thin JS shells.
What to change: Allow AI crawlers to access these pages by removing the JS challenge for known bot user agents, or serve a static HTML version.
Insights and people pages are thin JS shells Medium
The /en/insights page returns only 30 words and /en/people returns 58 words, suggesting they are JavaScript-rendered shells with minimal initial HTML content. AI crawlers that execute JS may see more, but most do not.
What to change: Implement server-side rendering or pre-rendering for these pages so that static HTML content is served to crawlers.
Organization schema lacks key fields Medium
The Organization schema on accessible pages includes only name, URL, and a single ContactPoint. Missing fields: description, sameAs, foundingDate, numberOfEmployees, areaServed, knowsAbout for practice areas.
What to change: Add description, sameAs links to social profiles, foundingDate, numberOfEmployees, areaServed, and knowsAbout for each practice area to the Organization schema.
Careers page lacks JobPosting schema Medium
The careers page lists multiple recruitment contacts and role categories but has no JobPosting structured data, reducing its visibility in AI-driven job searches.
What to change: Add JobPosting schema for each listed role category, including title, description, location, and application URL.
Insights page lacks Article or BlogPosting schema Medium
The insights page has no Article or BlogPosting schema for its content items, reducing the chance that individual articles appear in AI-generated summaries.
What to change: Add Article or BlogPosting schema to each insight item with headline, datePublished, author, and description.
Schema datePublished set to future date 2026-03-03 Low
The WebPage schema on the insights page includes a datePublished of 2026-03-03, which is a future date. This may confuse crawlers and reduce trust in the site's metadata.
What to change: Correct the datePublished to the actual publication date or remove it if not applicable.
No external press, reviews, or directory listings found Medium
Web searches for Fieldfisher on legal directories (Chambers, Legal 500), news sites, and review platforms returned zero results. This limits external signals that AI models use to validate authority.
What to change: Encourage clients and partners to leave reviews on platforms like Chambers, Legal 500, and Google. Publish press releases and thought leadership to generate news coverage.
LLM cold knowledge about Fieldfisher is richer than live site content High
The LLM knows Fieldfisher as a European law firm strong in tech, media, and financial services, representing Google and Meta, with a 'Concert' legal tech platform and 26+ offices. None of this is retrievable from the live site, which returns 403 on core pages.
What to change: Ensure that the live site's core pages (services, sectors, about) are accessible to AI crawlers and contain the information that the LLM already knows, to align online presence with reputation.
What's working
- Careers page provides rich, accessible content — The /en/careers page returns 954 words of detailed content including employee testimonials, recruitment contacts, and FAQ, and is accessible without a JS challenge. This is the only substantive page on the site.
- Robots.txt is served and accessible — The robots.txt file is served at /robots.txt with a 200 status, allowing crawlers to read the directives. This is a basic but necessary file for crawler communication.
- DNS TXT records reveal extensive SaaS integrations — The domain has 23 TXT records indicating integrations with Mimecast, DocuSign, LogMeIn, Atlassian, Canva, Miro, Pexip, and Exclaimer, suggesting a modern tech stack.
- Social media links present in footer — The site links to LinkedIn, Instagram, Twitter/X, and YouTube in its footer, providing external signals and pathways for users and crawlers to find the firm on other platforms.
- Wayback Machine snapshots preserve historical content — Snapshots of the homepage from January 2024 and August 2024 are available, showing the tagline 'A law firm built around people' and some content. This provides a fallback for historical reference.
- Cloudflare protection mitigates DDoS and abuse — The site uses Cloudflare WAF, which provides strong protection against DDoS attacks and malicious traffic, ensuring uptime and security.
- Careers page includes ContactPoint schema — The Organization schema on the careers page includes a ContactPoint with telephone number, providing a basic structured data signal for contact information.
- Content-Signal directive used in robots.txt — The robots.txt includes a Content-Signal: ai-train=no directive, which is an emerging standard for communicating AI training preferences. This shows awareness of AI crawler governance.
Track fieldfisher.com across AI search
This is one snapshot. Open the interactive report to inspect evidence, or grade another site free.