AI Site Grade
medidata.com — AI Site Grade
Medidata's homepage lacks structured data and uses a canonical URL that fragments AI indexing, while broken newsroom and press release infrastructure blocks crawlers from official announcements.
Medidata.com has strong crawler access but suffers from a fragmented homepage canonical, missing homepage schema, broken newsroom/press release infrastructure, and a 404 llms.txt, limiting AI visibility.
- Findings
- 9
- Evidence checks
- 22
- Completed
- 30 May 2026
Analysis
Medidata.com — AI-Visibility Audit
The homepage canonical points to an internal path (/en/page_v4/medidata/) rather than the cleaner /en/ URL, creating a structural split that fragments how AI crawlers index the brand's primary landing page.
Crawler Access
All major AI crawlers — GPTBot, ClaudeBot, PerplexityBot, Google-Extended, OAI-SearchBot, Bytespider, Applebot-Extended — receive a full 200 response with identical byte content (807KB) as a browser. No UA-based blocking, no JS shell, no Cloudflare challenge. The site runs on nginx behind Varnish cache on Pantheon infrastructure, with a 2-day cache TTL (public, max-age=172800). The robots.txt contains a single catch-all User-agent: * rule with no AI-bot-specific directives whatsoever — no GPTBot, no ClaudeBot, no PerplexityBot rules exist. The llms.txt returns a 404 with a full HTML page (605KB), meaning AI crawlers hitting that convention get a bloated error page instead of a structured content map.
Cold-Knowledge Gap
The LLM's prior knowledge describes Medidata as a cloud-based clinical trial platform whose flagship product is Rave EDC, mentions Acorn AI for analytics, and notes the 2019 Dassault Systèmes acquisition for $5.8B. The live site, however, has fully retired the Acorn AI brand — it appears nowhere on the site. The Dassault relationship is acknowledged only on the /en/company/ page ("Medidata, a Dassault Systèmes' brand") but is entirely absent from the homepage, the AI product page, and the experiences page. The cold model also references "data integration challenges and customer complaints about platform complexity" from 2023-2024 — a reputational signal the site makes no attempt to address or counter.
Schema Posture
The homepage carries no FAQ schema, no Organization schema, no Product schema, no SoftwareApplication schema. It has only WebPage, WebSite, BreadcrumbList, and ImageObject — the bare minimum WordPress output. Deeper pages (Experiences, AI, Link) do include FAQPage schema with well-structured Q&A, but the homepage — the page most likely to be retrieved by AI engines — has zero structured data beyond generic page metadata. No Organization schema means AI crawlers get no explicit signal about Medidata's industry, founding date, or Dassault relationship from the homepage.
Content & Answer Signals
The homepage is a 508-word marketing narrative with no FAQ, no comparison tables, no definition patterns. Key statistics (e.g., "90% of FDA novel drug approvals used Medidata in 2025") are rendered as zero-placeholder values (0 %) in the visible text, suggesting CSS-driven number animations that plain-text crawlers cannot parse. The AI product page (/en/clinical-trial-products/medidata-ai/) is stronger at 1,077 words with FAQ schema, but the homepage's thin content and missing numbers mean AI summaries of the brand will underweight its market dominance.
Internal Fragmentation
The /en/newsroom/ URL redirects to a password-protected staging page (/en/v3-demo-16/) titled "V3 Component Demos - 16 - Newsroom" — a demo environment leaked into production. The /en/press_release/ archive returns a 404. A press release URL found in the sitemap also 404s. The sitemap index contains 23 sub-sitemaps with 400+ URLs, but the press release and newsroom infrastructure is broken, meaning AI crawlers cannot access the brand's official announcements through expected paths.
Findings
Homepage canonical points to internal path, fragmenting AI indexing Medium
The homepage canonical URL is /en/page_v4/medidata/ instead of the cleaner /en/, creating a structural split that fragments how AI crawlers index the primary landing page.
What to change: Set the homepage canonical to /en/ and ensure all internal links point to that URL.
Homepage lacks Organization, FAQ, and Product schema High
The homepage has only WebPage, WebSite, BreadcrumbList, and ImageObject schema. No Organization, FAQ, Product, or SoftwareApplication schema exists, so AI crawlers get no explicit signals about Medidata's industry, Dassault relationship, or product offerings from the most important page.
What to change: Add Organization, FAQ, and SoftwareApplication schema to the homepage.
Key statistics rendered as zero-placeholder values invisible to crawlers Medium
Statistics like '90% of FDA novel drug approvals used Medidata in 2025' appear as '0 %' in the visible text, likely due to CSS-driven number animations that plain-text crawlers cannot parse.
What to change: Render statistics as static text in the HTML, not as zero-placeholder values animated by CSS.
Newsroom URL redirects to password-protected staging page High
The /en/newsroom/ URL redirects to a password-protected staging page titled 'V3 Component Demos - 16 - Newsroom', blocking AI crawlers from accessing official announcements.
What to change: Fix the newsroom URL to serve the actual newsroom content, not a staging redirect.
Press release archive returns 404 High
The /en/press_release/ URL returns a 404, and a press release URL found in the sitemap also 404s, meaning AI crawlers cannot access official announcements through expected paths.
What to change: Restore the press release archive and ensure all press release URLs resolve correctly.
llms.txt returns 404 with bloated error page Medium
The llms.txt file returns a 404 with a full HTML page (605KB), meaning AI crawlers hitting that convention get a bloated error page instead of a structured content map.
What to change: Create a valid llms.txt file with a summary of the site's content and links to key pages.
Retired Acorn AI brand still in LLM prior knowledge, unaddressed on site Low
LLM prior knowledge references Acorn AI as Medidata's analytics brand, but the live site has fully retired that brand. The site makes no attempt to address or clarify this discrepancy.
What to change: Add content on the AI product page or homepage that explicitly states the Acorn AI brand has been retired and integrated into Medidata AI.
Dassault relationship absent from homepage and key product pages Medium
The Dassault Systèmes acquisition is mentioned only on the /en/company/ page, not on the homepage, AI product page, or experiences page, so AI crawlers may not associate Medidata with its parent company.
What to change: Add a mention of 'a Dassault Systèmes brand' to the homepage and key product pages.
Reputational signals about platform complexity unaddressed Low
LLM prior knowledge includes references to 'data integration challenges and customer complaints about platform complexity' from 2023-2024, but the site makes no attempt to counter or address these signals.
What to change: Add content that proactively addresses common integration challenges and highlights improvements.
What's working
- All major AI crawlers receive full 200 response with identical content — GPTBot, ClaudeBot, PerplexityBot, Google-Extended, OAI-SearchBot, Bytespider, and Applebot-Extended all receive a full 200 response with identical byte content as a browser, with no UA-based blocking or JS shells.
- FAQPage schema present on key product pages — Deeper pages like Experiences, AI, and Link include FAQPage schema with well-structured Q&A, helping AI crawlers understand common questions about those products.
- Robots.txt has no AI-bot-specific blocking directives — The robots.txt contains a single catch-all rule with no GPTBot, ClaudeBot, or PerplexityBot directives, ensuring no accidental blocking of AI crawlers.
- Sitemap index with 23 sub-sitemaps and 400+ URLs — The sitemap index contains 23 sub-sitemaps with over 400 URLs, providing good coverage for AI crawlers to discover content.
- AI product page has strong content with FAQ schema — The AI product page at /en/clinical-trial-products/medidata-ai/ has 1,077 words of content with FAQ schema, providing substantial information for AI crawlers.
Track medidata.com across AI search
This is one snapshot. Open the interactive report to inspect evidence, or grade another site free.