AI Site Grade

speak.com — AI Site Grade

Speak.com's blog has broken canonical URLs for key milestones, zero JSON-LD schema across all pages, and a cold LLM knowledge gap of 2-3 years despite strong crawler access and an llms.txt file.

Speak.com allows all major AI crawlers and publishes an llms.txt, but suffers from broken blog URLs, zero structured data, and a cold LLM knowledge gap that underrepresents its $1B valuation and scale.

Findings
12
Evidence checks
24
Completed
30 May 2026

Analysis

Speak.com — AI-Visibility Audit

The site's blog lists a $78M Series C at a $1B valuation (December 2024), yet the blog post URL slug /blog/a-new-milestone-as-we-bring-language-learning-to-all-raising-78m-series-c-at-a-1b-valuation returns a 404, while a shorter slug /blog/series-c works — a canonicalization failure that means AI crawlers hitting the sitemap URL get a dead page. Similarly, /blog/speak-hits-500m-valuation-expands-rapidly-across-markets (listed on the blog index) also 404s.

Crawler Access

All major AI crawlers — GPTBot, ClaudeBot, PerplexityBot, Google-Extended, OAI-SearchBot, ChatGPT-User, Applebot-Extended, anthropic-ai — receive a full 200 response with identical content (112KB) to the browser baseline. Bytespider is the sole bot blocked (403) by Cloudflare. The robots.txt has no AI-specific directives; only a generic Disallow: */event/ rule. An llms.txt exists and is well-structured, listing core navigation, B2B pages, language regions, support resources, and legal pages — a rare and commendable implementation.

Schema Posture

Zero JSON-LD schema exists on any page examined — homepage, blog, B2B, careers, Korean AI tutor page. No Organization, WebSite, FAQPage, Product, or SoftwareApplication markup. The homepage has an FAQ section (7 questions) but no FAQPage schema. The B2B page has feature comparisons and testimonials but no structured data. This is a critical gap for AI engines extracting entity relationships.

Cold-Knowledge Gap

The cold LLM knows Speak as a 2016-founded app by Connor Zwick and Andrew Hsu, focused on English learners in Asia, with a $20M Series B in 2022. The site's actual story is far more advanced: $162M raised total, a $1B valuation (Series C led by Accel, December 2024), 15M+ downloads, 200+ B2B customers, a team of 130 people across 5 offices, and 3.74 billion lines spoken in 2025 (111% YoY growth). The cold model is roughly 2-3 years out of date on funding, scale, and product maturity.

Content & External Signals

The blog is active and substantive — posts from as recently as May 2026 cover agentic coding, voice agent platforms, and a partnership with the Washington Nationals. The site serves 10 language-localized versions (Korean, Japanese, Spanish, Chinese, Portuguese, French, German). DNS records show an OpenAI domain verification TXT record (openai-domain-verification=dv-...), confirming a formal integration partnership. The B2B page uses a HubSpot form for demos. Privacy policy and terms live on Notion rather than the main domain — a minor fragmentation risk for crawlers.

Findings

  1. Key blog milestone URLs return 404 errors High

    The blog post announcing the $78M Series C at a $1B valuation has a canonical URL that returns 404, while a shorter slug works. Another post about a $500M valuation also 404s. AI crawlers hitting the sitemap or blog index encounter dead pages for these important milestones.

    What to change: Redirect the broken URLs to the working slugs (e.g., /blog/series-c) or update the sitemap and blog index to point only to working URLs.

  2. No JSON-LD structured data on any page High

    Every page examined — homepage, blog, B2B, careers, Korean AI tutor page — lacks JSON-LD schema markup. No Organization, WebSite, FAQPage, Product, or SoftwareApplication schema is present, even where content (e.g., FAQ section on homepage) would naturally support it.

    What to change: Add JSON-LD schema for Organization, WebSite, FAQPage (homepage), Product/SoftwareApplication (app), and BreadcrumbList across all pages.

  3. Cold LLM knowledge is 2-3 years out of date High

    The cold LLM knows Speak as a 2016-founded app with a $20M Series B in 2022, focused on English learners in Asia. The site's actual story includes $162M total raised, a $1B valuation (Series C in December 2024), 15M+ downloads, 200+ B2B customers, and 3.74 billion lines spoken in 2025. This gap means AI-generated summaries underrepresent the company's scale and credibility.

    What to change: Publish a structured data-rich press release or funding announcement page with schema markup, and ensure the homepage and about page contain current funding and scale data in plain text.

  4. Bytespider crawler is blocked by Cloudflare Medium

    Bytespider (ByteDance's crawler) receives a 403 response from Cloudflare, while all other major AI crawlers are allowed. This blocks content indexing by ByteDance's AI products.

    What to change: Allow Bytespider access by adjusting Cloudflare WAF rules or robots.txt to permit its user-agent.

  5. Privacy policy and terms hosted on Notion subdomain Low

    The privacy policy and terms of service are hosted on a Notion subdomain rather than the main speak.com domain. This fragments content and may reduce crawl priority or trust signals for AI crawlers.

    What to change: Host privacy policy and terms on the main domain (e.g., /privacy, /terms) to consolidate authority and crawl efficiency.

  6. Homepage FAQ section lacks FAQPage schema Medium

    The homepage contains an FAQ section with 7 questions, but no FAQPage JSON-LD markup is present. This is a missed opportunity for rich results in AI-generated answers.

    What to change: Add FAQPage schema to the homepage FAQ section.

  7. Blog posts lack Article schema Medium

    Blog pages do not include Article or BlogPosting schema markup, reducing their visibility in AI-generated news summaries and knowledge panels.

    What to change: Add Article or BlogPosting schema to all blog posts with headline, datePublished, author, and image.

  8. No BreadcrumbList schema on any page Low

    No page includes BreadcrumbList structured data, which helps AI crawlers understand site hierarchy and navigation paths.

    What to change: Add BreadcrumbList schema to all pages with breadcrumb navigation.

  9. No SoftwareApplication schema for the app Medium

    The site does not use SoftwareApplication schema to describe the Speak app, which would help AI engines present app details (rating, price, features) in search results and AI answers.

    What to change: Add SoftwareApplication schema on the homepage and app download pages with name, description, applicationCategory, operatingSystem, and offers.

  10. B2B page lacks Product or Service schema Medium

    The B2B page describes enterprise language learning features and pricing but has no Product or Service schema, limiting its ability to appear in AI-generated B2B comparisons.

    What to change: Add Product or Service schema to the B2B page with name, description, offers, and aggregateRating if testimonials are present.

  11. No Organization schema on any page Medium

    No page includes Organization schema with company details (founding date, funding, social profiles), which would help AI knowledge panels display accurate information.

    What to change: Add Organization schema to the homepage and about page with name, url, logo, foundingDate, and sameAs links.

  12. Limited external web presence for key queries Low

    Web searches for 'Speak.com language app reviews 2025', 'Speak language learning AI tutor review', and 'Speak language learning unicorn 2024' returned zero results. This indicates low external signal volume, which may affect AI crawler trust and citation frequency.

    What to change: Encourage user reviews on third-party platforms and pursue PR coverage to increase external mentions and backlinks.

What's working

  • llms.txt file is well-structured and comprehensive — Speak.com publishes an llms.txt file that lists core navigation, B2B pages, language regions, support resources, and legal pages. This is a rare and commendable implementation that helps AI crawlers discover key content efficiently.
  • All major AI crawlers receive full access — GPTBot, ClaudeBot, PerplexityBot, Google-Extended, OAI-SearchBot, ChatGPT-User, Applebot-Extended, and anthropic-ai all receive a 200 response with identical content to the browser baseline. No AI-specific blocking in robots.txt.
  • OpenAI domain verification TXT record present — DNS records include an OpenAI domain verification TXT record, confirming a formal integration partnership with OpenAI. This can boost trust and visibility in OpenAI's ecosystem.
  • Blog is active with recent, substantive posts — The blog features posts from as recently as May 2026 covering agentic coding, voice agent platforms, and a partnership with the Washington Nationals. Fresh content signals to AI crawlers that the site is actively maintained.
  • 10 language-localized versions of the site — The site serves content in Korean, Japanese, Spanish, Chinese, Portuguese, French, German, and more. This broadens the site's reach to international AI crawlers and users.
  • B2B page includes a HubSpot demo request form — The B2B page has a HubSpot form for scheduling demos, which can capture leads and provide structured data for AI crawlers if properly marked up.
  • Robots.txt is clean with no AI-specific blocks — The robots.txt file has no AI-specific disallow rules, only a generic rule for event pages. This ensures AI crawlers are not inadvertently blocked.
  • Site is archived in Wayback Machine — A Wayback Machine snapshot from May 2026 is available, providing a historical record that AI crawlers can reference for content verification.

Track speak.com across AI search

This is one snapshot. Open the interactive report to inspect evidence, or grade another site free.

Open this AI Site Grade Grade another site Track your brand