AI Site Grade
coursera.org — AI Site Grade
Coursera's robots.txt disallow of /lecture/ is inert because those URLs redirect to accessible /learn/ pages, and the site's llms.txt is one of the largest observed at 3.2MB.
Coursera.org has strong AI crawler access and rich course schema, but its robots.txt restrictions are ineffective, enterprise pages lack schema, and the cold LLM knowledge gap around the Udemy combination and B Corp status needs bridging.
- Findings
- 8
- Evidence checks
- 21
- Completed
- 30 May 2026
Analysis
Coursera.org — AI-Visibility Audit
The site's /lecture/ path disallow rules for every major AI crawler are effectively inert — GPTBot, ClaudeBot, Google-Extended, and PerplexityBot all redirect cleanly from /lecture/ URLs to /learn/ course pages, which are fully accessible and return rich 200 responses with complete content.
Crawler Access
All 11 tested AI crawlers (GPTBot, ClaudeBot, Google-Extended, PerplexityBot, OAI-SearchBot, ChatGPT-User, anthropic-ai, Bytespider, Applebot-Extended, Perplexity-User, and a browser baseline) receive 200 status with ~800KB of full HTML from the homepage. No UA-based blocking, no Cloudflare challenge, no JS shell. The site runs on AWS CloudFront + Envoy proxy with Express backend. The robots.txt explicitly names GPTBot, ClaudeBot, Google-Extended, PerplexityBot, OAI-SearchBot, and ChatGPT-User, disallowing only /lecture/ — but since those URLs 301-redirect to /learn/, the restriction has no practical effect. The llms.txt exists and is massive (3.2MB), listing thousands of article URLs — one of the largest observed in the wild.
Cold-Knowledge Gap
The LLM prior knows Coursera as a 2012-founded platform by Andrew Ng and Daphne Koller, a public company (NYSE: COUR), with Professional Certificates from Google/IBM and online degrees. However, the prior mentions 2023 layoffs and a "shift toward corporate-focused offerings" as recent reputational signals — a narrative the site itself never addresses. The site's /about page instead highlights a recent combination with Udemy ("Coursera recently combined with Udemy to create one of the world's most comprehensive skills development platforms"), a major M&A event the cold model does not mention at all. The site also emphasizes B Corp status and social-impact programs (refugee, veteran, justice-impacted partners) — none of which appear in the model's prior.
Schema Posture
The homepage carries FAQPage and Organization schema with 7 well-structured Q&A entries covering accreditation, pricing, and career outcomes. Course pages (e.g., /learn/machine-learning) use Course schema with aggregateRating, syllabusSections, totalHistoricalEnrollment (1.19M), review entries, and educationalCredentialAwarded — among the most complete course schema implementations seen. The /degrees page uses ItemList schema for program catalogs. However, the /business enterprise page has zero JSON-LD schema — no Product, Service, or Organization markup for the B2B offering. The /about page uses only a bare WebPage schema, missing Organization with founding date, founders, and sameAs links that the articles hub does include.
External Signals
The DNS TXT records reveal integrations with OpenAI, Anthropic, Stripe, HubSpot, Canva, Docker, Zoom, and Miro — a broad AI/tech vendor stack. The site links externally to Udemy in its footer (a competitor, now reportedly combined with Coursera). The articles hub (/articles) serves as a massive SEO content engine with 30+ categories, and the llms.txt exposes thousands of these articles to AI crawlers in plain text — a strong signal for AI-driven discovery. The blog lives on a separate subdomain (blog.coursera.org), which may fragment the AI knowledge graph.
Findings
Robots.txt disallow of /lecture/ is ineffective due to redirects Medium
The robots.txt disallows GPTBot, ClaudeBot, Google-Extended, and PerplexityBot from /lecture/ URLs, but those URLs 301-redirect to /learn/ pages which are fully accessible. The restriction has no practical blocking effect.
What to change: Remove the /lecture/ disallow rules from robots.txt since they have no effect, or update them to match the actual accessible paths.
Enterprise /business page lacks any JSON-LD schema High
The /business page has zero JSON-LD schema markup, missing Product, Service, or Organization schema that would help AI crawlers understand the B2B offering.
What to change: Add Product or Service schema to the /business page describing the enterprise learning platform, including pricing tiers, features, and target audience.
About page uses only bare WebPage schema Medium
The /about page contains only a minimal WebPage schema, missing Organization markup with founding date, founders, sameAs links, and other rich metadata that the articles hub includes.
What to change: Add Organization schema to the /about page with founding date, founders, sameAs links, and B Corp status.
Cold LLM knowledge lacks recent Udemy combination High
The LLM prior does not mention the combination with Udemy, which the site prominently highlights on its /about page. This gap means AI-generated summaries may miss a major corporate development.
What to change: Ensure the Udemy combination is prominently featured in structured data and on key pages to improve AI knowledge assimilation.
Cold LLM knowledge lacks B Corp status and social impact programs Medium
The LLM prior does not mention Coursera's B Corp certification or social impact programs (refugee, veteran, justice-impacted partners), which the site emphasizes. This may affect AI-generated brand perception.
What to change: Add B Corp and social impact details to structured data on the /about page and homepage.
Blog hosted on separate subdomain fragments AI knowledge graph Low
The blog lives at blog.coursera.org, a separate subdomain from the main site. This may fragment the AI knowledge graph and reduce the authority transfer between the blog and core domain.
What to change: Consider moving the blog to a subdirectory (coursera.org/blog) to consolidate domain authority and improve AI crawler efficiency.
Articles hub page returns minimal content (31 words) Medium
The /articles page returns only 31 words of visible content, likely relying on JavaScript to load articles. This may limit AI crawlers' ability to index the full article catalog.
What to change: Ensure the /articles page includes server-rendered content or a static list of article links for crawlers.
Footer links to Udemy, a competitor, may dilute brand signals Low
The site's footer contains an external link to Udemy, which is a competitor (and reportedly now combined with Coursera). This may create confusing signals for AI crawlers about brand relationships.
What to change: Remove or nofollow the external link to Udemy in the footer to avoid diluting brand authority.
What's working
- All 11 tested AI crawlers receive full HTML from homepage — Every major AI crawler gets a 200 response with ~800KB of full HTML from the homepage, with no UA-based blocking or JS shell.
- llms.txt is one of the largest observed at 3.2MB — The llms.txt file lists thousands of article URLs, providing AI crawlers with a comprehensive plain-text index of content.
- Course pages have comprehensive Course schema with ratings and enrollment — Course pages like /learn/machine-learning include Course schema with aggregateRating, syllabusSections, totalHistoricalEnrollment (1.19M), and educationalCredentialAwarded.
- Homepage includes FAQPage and Organization schema with 7 Q&A entries — The homepage carries FAQPage and Organization schema covering accreditation, pricing, and career outcomes.
- Degrees page uses ItemList schema for program catalog — The /degrees page uses ItemList schema to structure its program catalog, aiding AI crawlers in understanding available degrees.
- DNS TXT records show integrations with OpenAI, Anthropic, and other AI vendors — The DNS TXT records reveal integrations with OpenAI, Anthropic, Stripe, HubSpot, Canva, Docker, Zoom, and Miro, indicating a broad AI/tech vendor stack.
- Articles hub serves as massive SEO content engine with 30+ categories — The /articles hub has 30+ categories and thousands of articles exposed via llms.txt, providing strong AI-driven discovery signals.
- How Coursera Works page provides clear onboarding content — The /about/how-coursera-works page offers 537 words of explanatory content, helping AI crawlers understand the platform's value proposition.
Track coursera.org across AI search
This is one snapshot. Open the interactive report to inspect evidence, or grade another site free.