AI Site Grade
fivetran.com — AI Site Grade
Fivetran's missing robots.txt and zero structured data on marketing pages create a visibility ceiling for AI crawlers, while a 1.3MB llms.txt and a 2-3x connector count gap between cold knowledge and site claims further limit AI discoverability.
Fivetran's AI visibility is undermined by a missing robots.txt, no structured data on key pages, a bloated llms.txt, and a significant gap between cold LLM knowledge and current site claims.
- Findings
- 10
- Evidence checks
- 24
- Completed
- 30 May 2026
Analysis
Fivetran's /robots.txt returns a 404 — a Next.js HTML error page — meaning no AI crawler directives exist anywhere on the domain, yet every major AI bot (GPTBot, ClaudeBot, PerplexityBot, Google-Extended, OAI-SearchBot, ChatGPT-User, anthropic-ai) receives a full 200 with identical content to a browser visitor.
Missing robots.txt — No AI Crawler Governance
The domain at fivetran.com/robots.txt returns HTTP 404 with a 24KB Next.js HTML error page. No robots.txt file exists. This means there are zero directives for any crawler — AI or otherwise. No Disallow for /login, /signup, /dashboard, or any internal path. No Allow or Disallow for GPTBot, ClaudeBot, Google-Extended, PerplexityBot, Bytespider, or any other AI user-agent. The Wayback Machine has no historical snapshot of a robots.txt either, suggesting this has never existed. Despite this, compare_bot_access confirms all major AI bots receive HTTP 200 with the full 127KB page — except Bytespider, which gets a Cloudflare 403 block. The site runs on Cloudflare + AWS (DNS points to 35.236.237.87), and DNS TXT records confirm openai-domain-verification and anthropic-domain-verification are present, indicating active relationships with both OpenAI and Anthropic.
No Structured Data on Marketing Pages
The homepage, pricing page, platform-overview page, blog index, case studies index, and people pages all contain zero JSON-LD schema of any type. No Organization, WebSite, Product, FAQPage, BreadcrumbList, or SoftwareApplication schema is present. The only page with any schema is a single blog post (/blog/how-i-used-ai-agents-to-optimize-product-operations), which carries a BlogPosting schema with a datePublished of "May 08, 2026" — a date in the future. This is either a placeholder or a publishing error. The blog post's dateModified is "May 26, 2026," also in the future. The pricing page has an FAQ section with real questions but no FAQPage schema wrapping them.
Cold-Knowledge Gap: Connector Count and AI Positioning
The cold LLM knowledge describes Fivetran as having "300+ pre-built connectors." The site itself now claims "700+ sources" and "900+ sources and destinations" on the homepage. This is a 2-3x gap between what AI models know and what the site asserts. The cold knowledge also mentions the 2021 Permira acquisition ($5.6B) and pricing criticism — neither of which appears anywhere on the marketing site. The site's current positioning is heavily tilted toward AI: "The data foundation for AI," "Automated data for autonomous agents," and "Data foundation for AI" are the primary H1/H2 messages. The cold knowledge does not reflect this AI-first framing at all, describing Fivetran instead as an "ELT pipeline" company.
llms.txt Exists but Is a Documentation Dump
Fivetran serves a 1.3MB llms.txt at /llms.txt. It is a flat, exhaustive listing of documentation pages (thousands of entries) with markdown links and brief descriptions. It references a docs-sitemap.xml sub-sitemap. While the file exists and is technically valid, its sheer size (1.3 million bytes) and lack of prioritization or summarization make it impractical for an LLM to consume efficiently in context. The file is a raw index rather than a curated AI-friendly summary of the brand's value proposition, key products, and differentiators.
Blog Content Anomalies and Missing Schema
The blog listing page (/blog) shows only 311 words of visible content and relies on JavaScript filtering (category/type/region dropdowns) that may not render for AI crawlers. The featured blog post has a datePublished of May 2026 — a future date that will confuse temporal reasoning in LLMs. The blog's meta description is generic ("Fivetran is the smartest way to load data into a warehouse") and does not match the current AI-centric homepage messaging. The case studies page similarly has no structured data and relies on a JavaScript-powered filter interface.
Findings
Missing robots.txt leaves AI crawlers ungoverned High
The domain returns a 404 for /robots.txt, meaning no directives exist for any crawler. All major AI bots (GPTBot, ClaudeBot, etc.) receive full 200 responses with identical content to browser visitors, except Bytespider which is blocked by Cloudflare. No historical robots.txt exists.
What to change: Create a robots.txt file that explicitly allows or disallows AI crawlers as desired, and consider blocking sensitive paths like /login or /dashboard.
Zero JSON-LD schema on all marketing pages High
The homepage, pricing, platform-overview, blog index, case studies, and people pages contain no structured data of any type. No Organization, WebSite, Product, FAQPage, or BreadcrumbList schema is present, reducing AI understanding of the site's content and entity relationships.
What to change: Add JSON-LD structured data (Organization, WebSite, Product, FAQPage, BreadcrumbList) to all relevant marketing pages.
Blog post schema contains future dates Medium
The only page with structured data, a blog post, has a datePublished of 'May 08, 2026' and dateModified of 'May 26, 2026' — both in the future. This will confuse LLMs and may reduce trust in temporal information.
What to change: Correct the datePublished and dateModified values to actual dates.
Cold LLM knowledge understates connector count by 2-3x Medium
Cold knowledge describes Fivetran as having '300+ pre-built connectors,' while the site claims '700+ sources' and '900+ sources and destinations.' This gap means AI models may underestimate Fivetran's capabilities.
What to change: Ensure the site prominently and consistently states the connector count in structured data and visible text to help AI models update their knowledge.
llms.txt is a 1.3MB raw documentation dump Medium
The llms.txt file exists but is an exhaustive, unprioritized listing of thousands of documentation pages. Its size and lack of summarization make it impractical for LLMs to consume efficiently in context.
What to change: Restructure llms.txt to include a concise summary of the brand, key products, and differentiators, with prioritized links rather than a flat dump.
Blog listing relies on JavaScript for filtering Medium
The blog index page shows only 311 words of visible content and uses JavaScript-powered dropdowns for category/type/region filtering. AI crawlers may not execute this JS, limiting access to full blog content.
What to change: Ensure blog content is accessible without JavaScript, or provide a static HTML fallback for crawlers.
Case studies page uses JavaScript filter interface Medium
The case studies page relies on JavaScript for filtering, which may not be accessible to AI crawlers, reducing discoverability of case study content.
What to change: Provide a static HTML version of the case studies listing or ensure server-side rendering for crawlers.
Pricing page FAQ lacks FAQPage schema Low
The pricing page contains an FAQ section with real questions but no FAQPage structured data, missing an opportunity for rich results in AI responses.
What to change: Add FAQPage schema to the FAQ section on the pricing page.
Blog meta description does not reflect AI positioning Low
The blog's meta description is 'Fivetran is the smartest way to load data into a warehouse,' which does not match the current AI-centric homepage messaging ('The data foundation for AI'). This inconsistency may confuse AI models.
What to change: Update the blog meta description to align with the current AI-focused brand messaging.
Cold knowledge lacks AI-first positioning Medium
Cold LLM knowledge describes Fivetran as an 'ELT pipeline' company, while the site now heavily emphasizes AI: 'The data foundation for AI.' This gap means AI models may not associate Fivetran with AI data infrastructure.
What to change: Reinforce AI positioning through consistent messaging across the site, structured data, and external signals to help AI models update their knowledge.
What's working
- llms.txt file is served and technically valid — Fivetran serves a 1.3MB llms.txt at /llms.txt, providing a comprehensive index of documentation pages. This is a proactive step for AI discoverability, even if the file needs refinement.
- DNS TXT records confirm OpenAI and Anthropic domain verification — The domain has openai-domain-verification and anthropic-domain-verification TXT records, indicating active relationships with both AI companies, which can improve crawling and indexing.
- All major AI bots receive full access to homepage — Except for Bytespider (blocked by Cloudflare), all major AI bots (GPTBot, ClaudeBot, Google-Extended, etc.) receive HTTP 200 with full page content, ensuring they can index the site.
- Blog post includes BlogPosting schema — A blog post includes BlogPosting structured data, which helps AI models understand the content type and publication details, despite the date errors.
- Sitemap is accessible and contains 80 URLs — A sitemap is served at /sitemap.xml with 80 URLs, helping crawlers discover site content.
- Documentation getting-started page is accessible — The /docs/getting-started page returns 200 with 158 words of content, providing a starting point for AI crawlers to understand the product.
- People page provides detailed employee profiles — The people page for Anjan Kundavaram contains 499 words of content, offering rich information about leadership that AI models can index.
Track fivetran.com across AI search
This is one snapshot. Open the interactive report to inspect evidence, or grade another site free.