AI Site Grade
taylorandfrancis.com — AI Site Grade
Taylor & Francis blocks 9 of 10 major AI crawlers at the Cloudflare edge while actively selling AI training data licensing.
Taylor & Francis has a paradoxical AI-visibility posture: it sells AI training data but blocks nearly all AI crawlers via Cloudflare, has no external visibility for its AI licensing business, and uses minimal schema on its corporate site.
- Findings
- 10
- Evidence checks
- 28
- Completed
- 30 May 2026
Analysis
---
Taylor & Francis blocks 9 of 10 major AI crawlers at the Cloudflare edge while actively selling AI training data licensing
The site's AI-visibility posture is paradoxical: Taylor & Francis operates a dedicated "Content for AI models" sales page offering peer-reviewed content for LLM/SLM/RAG training, yet Cloudflare returns HTTP 403 to GPTBot, ClaudeBot, PerplexityBot, OAI-SearchBot, ChatGPT-User, anthropic-ai, Bytespider, Applebot-Extended, and Perplexity-User — every major AI crawler except Google-Extended. Only Google-Extended and standard browsers receive a 200 with full content (~150KB). The robots.txt at /robots.txt contains no AI-bot directives whatsoever (just a generic * rule blocking /wp-admin/ and /?s=), meaning the blocks are enforced at the Cloudflare WAF layer, not via robots.txt. No /llms.txt exists (404).
Cold-Knowledge Gap
A frontier LLM queried cold about Taylor & Francis correctly identifies it as a major academic publisher (founded 1798, part of Informa, 2,700+ journals, Routledge/CRC Press brands) but knows nothing about the company's AI data licensing business. The model mentions "criticism over high subscription costs and open-access policies" as recent reputation signals — a stale narrative that the site itself does not address. The site's actual homepage and corporate pages emphasize "fostering human progress through knowledge," sustainability reporting, and research integrity toolkits, but contain no rebuttal or framing around the AI-training-content controversy.
Schema Posture
Every page carries consistent Organization and WebPage JSON-LD with the same @id: #organization fragment identifier — a valid but minimal pattern. The Organization schema includes founding date (1852 — note: the cold model said 1798, a discrepancy of 54 years), address, sameAs links, and areaServed: Worldwide. The WebPage schema includes datePublished (many pages show dates in late 2025, suggesting a recent site rebuild) and SearchAction with a potentialAction target. No FAQPage, Product, Article (on the main site), BreadcrumbList, or HowTo schemas appear anywhere on the corporate domain. The insights blog (subdomain) uses richer Article + BreadcrumbList + Person schema, but the main corporate site does not.
External Signals
The newsroom subdomain (newsroom.taylorandfrancisgroup.com) shows active press releases about open-access agreements, research integrity partnerships, and sustainability — but zero press releases about AI data licensing deals. A web search for "Taylor & Francis AI licensing" returns no indexed news coverage, no Reddit threads, no industry analyst mentions. The company is selling AI training data through a contact-form-gated page (/medical-publication-professionals/content-for-ai-models/) with a named business development manager, yet this offering has no external visibility whatsoever. The AI-licensing case study page (/partnership/commercial/research-access/law/ai-licensing/) contains only 49 words of visible text and is essentially a lead-gen form gate.
Infrastructure and Content Surprises
The site runs on WP Engine (WordPress) behind Cloudflare, with nginx as the origin server. The X-Powered-By: WP Engine header reveals the stack. The sitemap contains 211+ URLs but is a flat sitemap (not an index), suggesting the site is relatively small for a global publisher. The /journals/ URL redirects to /who-we-serve/ — a 301 that buries journal discovery. The /knowledge/ page is marked noindex, nofollow, making it invisible to search engines despite being a subject-area directory. The insights blog article on AI for academic research returns zero words of visible text from a plain GET (JS-rendered content), meaning AI crawlers that do get through (Google-Extended) may still see an empty shell.
Findings
9 of 10 major AI crawlers blocked at Cloudflare edge High
Cloudflare returns HTTP 403 to GPTBot, ClaudeBot, PerplexityBot, OAI-SearchBot, ChatGPT-User, anthropic-ai, Bytespider, Applebot-Extended, and Perplexity-User. Only Google-Extended and standard browsers receive a 200 with full content.
What to change: Allow major AI crawlers (GPTBot, ClaudeBot, PerplexityBot, etc.) at the Cloudflare WAF layer to enable AI visibility for the AI training data licensing business.
Robots.txt contains no AI-bot directives Medium
The robots.txt file only has a generic rule blocking /wp-admin/ and /?s=, with no mention of any AI crawler. All AI crawler blocking is enforced at the Cloudflare WAF layer, not via robots.txt.
What to change: Add explicit AI crawler directives to robots.txt to align with the Cloudflare blocking strategy and provide transparency.
No /llms.txt file published Medium
The site returns a 404 for /llms.txt, missing an opportunity to guide AI crawlers to key content and signal AI readiness.
What to change: Publish an /llms.txt file listing key pages for AI crawlers, such as the AI licensing page and journal directories.
Cold LLM knows nothing about AI data licensing business High
A frontier LLM queried cold about Taylor & Francis correctly identifies it as a major academic publisher but knows nothing about its AI data licensing offerings. The model mentions stale reputation signals (high subscription costs, open-access policies) that the site does not address.
What to change: Publish a dedicated press release or blog post about the AI licensing business and ensure it is indexed and crawlable by AI bots.
AI licensing case study page is thin and gated Medium
The AI licensing case study page at /partnership/commercial/research-access/law/ai-licensing/ contains only 49 words of visible text and is essentially a lead-gen form gate, providing almost no substantive content for AI crawlers.
What to change: Expand the AI licensing case study page with detailed, crawlable content describing the offering, benefits, and use cases.
Zero external news coverage or discussion of AI licensing High
Web searches for 'Taylor & Francis AI licensing' return no indexed news articles, Reddit threads, or industry analyst mentions. The newsroom subdomain has no press releases about AI data licensing deals.
What to change: Issue a press release about the AI licensing business and engage with industry media to generate external visibility.
Corporate site uses minimal schema markup Medium
The main corporate site only has Organization and WebPage JSON-LD. No FAQPage, Product, Article, BreadcrumbList, or HowTo schemas are present, limiting rich results and AI understanding.
What to change: Add relevant schemas such as Article for blog posts, BreadcrumbList for navigation, and FAQPage for common questions.
Knowledge page is noindex, nofollow Medium
The /knowledge/ page, which serves as a subject-area directory, is marked noindex, nofollow, making it invisible to search engines and AI crawlers.
What to change: Remove the noindex, nofollow directive from the /knowledge/ page to allow indexing and crawling.
Insights blog article returns zero visible text High
A plain GET request to an insights blog article on AI for academic research returns zero words of visible text, indicating JS-rendered content that AI crawlers may not execute.
What to change: Ensure that key content is server-side rendered or included in the initial HTML response for AI crawlers.
Journals URL redirects, burying journal discovery Low
The /journals/ URL redirects to /who-we-serve/, which may confuse crawlers and users looking for journal listings.
What to change: Create a dedicated, crawlable journal directory page at /journals/ instead of redirecting.
What's working
- Dedicated page offering content for AI models — The site has a dedicated page at /medical-publication-professionals/content-for-ai-models/ that explicitly offers peer-reviewed content for LLM, SLM, and RAG training, signaling AI readiness.
- Consistent Organization and WebPage JSON-LD across pages — Every page carries valid Organization and WebPage JSON-LD with consistent @id, founding date, address, sameAs links, and areaServed, providing a solid foundation for AI understanding.
- Google-Extended crawler is allowed — Google-Extended, the AI crawler for Google's AI models, receives a 200 response with full content, ensuring visibility in Google's AI products.
- Active newsroom with press releases on open access and research integrity — The newsroom subdomain publishes press releases about open-access agreements and research integrity partnerships, providing positive external signals.
- Sitemap published with 80+ URLs — A sitemap is available at /sitemap.xml with 80 URLs, helping crawlers discover pages.
- Corporate solutions page with clear offering — The /corporate-solutions page provides a clear description of services for corporate clients, aiding AI understanding of the business.
Track taylorandfrancis.com across AI search
This is one snapshot. Open the interactive report to inspect evidence, or grade another site free.