AI Site Grade
greenhouse.com — AI Site Grade
Greenhouse's cold LLM knowledge is stuck in 2024, missing the entire AI product suite and Ezra AI Labs acquisition.
Greenhouse is technically open to all major AI crawlers, but its cold LLM knowledge is 18 months stale, missing the AI product suite and Ezra AI Labs acquisition.
- Findings
- 8
- Evidence checks
- 19
- Completed
- 30 May 2026
Analysis
Greenhouse: AI crawlers get full content, but cold LLM knowledge is stuck in 2024
The site is technically wide open to every major AI crawler — GPTBot, ClaudeBot, PerplexityBot, Google-Extended, OAI-SearchBot, ChatGPT-User, and anthropic-ai all receive a 200 with the full 192KB HTML page, identical to a browser. The only blocked bot is Bytespider (TikTok/Bytedance), which gets a Cloudflare 403. This is an unusually permissive posture: most enterprise SaaS sites block at least GPTBot or Google-Extended. Greenhouse does not.
Crawler Access
The robots.txt is minimal — a single User-agent: * rule disallowing only /*_page= (a Webflow pagination parameter), with no AI-bot-specific directives whatsoever. No llms.txt exists (404). The site runs on Cloudflare (CDN) with Webflow as the CMS, served via AWS Route53 DNS. The homepage and all key pages render full server-side HTML — no JS-shell risk. The sitemap.xml is healthy, listing 3,022 URLs across localized (UK, DE) and English pages. The blog, pricing, compare, and AI-recruiting pages all return substantial visible text (500–1,000+ words) on a plain GET.
Cold-Knowledge Gap
The cold LLM knows Greenhouse as a cloud-based ATS founded in 2012 by Daniel Chait and Jon Stross, acquired by TPG in 2021, with notable clients Airbnb/Uber/Stripe. It mentions "Structured Hiring" as a differentiator and notes 2023–2024 layoffs and competition from AI-driven platforms. This knowledge is stale by roughly 18 months. The site today is aggressively repositioning around Greenhouse AI — 20+ AI features, ISO 42001:2023 certification, monthly bias audits by Warden AI, and the May 2026 acquisition of Ezra AI Labs for conversational AI interviews. The cold model knows nothing about Ezra, the AI product suite, "Real Talent" (fraud detection + identity verification + candidate matching), or "MyGreenhouse" candidate portal. The gap between the model's prior (a legacy ATS with layoff baggage) and the site's current narrative (an AI-first hiring platform with acquisition momentum) is the single largest visibility problem.
Schema Posture
The homepage carries a SoftwareApplication schema with applicationSubCategory: "Applicant Tracking System", an AggregateRating (4.5/5, count: 1 — suspiciously low), and a single review from Ocado Group. The AI-recruiting page has a rich FAQPage schema with 7 questions covering bias, integration, compliance, and candidate experience — this is strong. The pricing page also has FAQPage with 4 questions. However, the blog index page has zero schema — no BlogPosting, no Article, no BreadcrumbList. Individual blog posts (like the Ezra acquisition) only carry BreadcrumbList, missing Article or NewsArticle schema entirely. The compare page has WebPage with embedded reviews but no Product or Comparison schema.
External Signals
Web search returned zero indexed results for Greenhouse reviews, G2 mentions, or Reddit discussions — this is likely a DuckDuckGo limitation rather than a real absence. The site prominently claims "#1 ATS on G2" for enterprise, mid-market, and Europe (Spring 2026 Grid Reports), and cites a 36% recruiter efficiency improvement study. The Ezra AI Labs acquisition (announced May 5, 2026) is the most significant recent signal — a move that directly addresses the "AI Doom Loop" concept the blog introduces. No press coverage of this acquisition appeared in search results, suggesting limited external amplification.
Surprising Findings
The Ezra acquisition blog post is gated behind a form — the full text loads but the page includes multiple "Download this Article" and "fill out the form" interstitials. An AI crawler fetching this URL gets the full content (950 words), but human readers hit friction. The aggregateRating on the homepage schema shows ratingCount: 1 — a single review powering the star rating for a company claiming #1 ATS on G2, which is a schema quality issue. The robots.txt allows all AI bots but blocks nothing meaningful, which is fine for crawlability but means no guidance exists for which content AI engines should prioritize. No llms.txt means no AI-friendly content map exists — a missed opportunity given the depth of structured product documentation available.
Findings
Cold LLM knowledge is 18 months stale, missing AI product suite and Ezra acquisition High
The cold LLM knows Greenhouse as a legacy ATS with layoff baggage, but knows nothing about the 20+ AI features, ISO 42001 certification, Warden AI audits, or the May 2026 Ezra AI Labs acquisition. This gap between the site's current narrative and the model's prior is the largest visibility problem.
What to change: Publish an llms.txt file that summarizes the AI product suite, certifications, and key acquisitions. Ensure high-value pages like the AI recruiting page and Ezra acquisition blog post are prominently linked in the sitemap and have strong internal linking.
No llms.txt file exists, missing opportunity to guide AI crawlers Medium
The site returns a 404 for llms.txt, meaning no AI-friendly content map is provided. Given the depth of structured product documentation, this is a missed opportunity to help LLMs prioritize the most important content.
What to change: Create an llms.txt file that lists key pages (AI recruiting, pricing, blog, etc.) with brief descriptions to guide AI crawlers.
Homepage SoftwareApplication schema shows ratingCount of 1 Medium
The aggregateRating on the homepage schema has ratingCount: 1, which is suspiciously low for a company claiming #1 ATS on G2. This undermines the credibility of the schema data.
What to change: Update the aggregateRating to reflect the actual number of reviews from G2 or other reputable sources, or remove the rating if it cannot be accurately represented.
Blog index page has zero schema markup Medium
The blog index page lacks any schema (no BlogPosting, Article, or BreadcrumbList), reducing its visibility in rich results and AI crawler understanding.
What to change: Add BlogPosting schema to the blog index and Article or NewsArticle schema to individual blog posts.
Individual blog posts missing Article or NewsArticle schema Medium
Blog posts like the Ezra acquisition announcement only carry BreadcrumbList schema, missing Article or NewsArticle markup that would help search engines and AI crawlers understand the content type.
What to change: Add Article or NewsArticle schema to all blog posts, including headline, datePublished, author, and image.
Compare page lacks Product or Comparison schema Low
The compare page uses WebPage schema with embedded reviews but no Product or Comparison schema, missing an opportunity to appear in comparison rich results.
What to change: Add Product schema with brand and offers, or use a ComparisonReview schema to structure the comparison content.
Ezra AI Labs acquisition blog post has form interstitials Low
The Ezra acquisition blog post includes 'Download this Article' and form interstitials that create friction for human readers, though AI crawlers still get the full content.
What to change: Consider reducing or removing form interstitials on key blog posts to improve user experience and ensure AI crawlers can easily access the full content without friction.
Limited external amplification of key signals like Ezra acquisition Medium
Web search returned no indexed results for the Ezra AI Labs acquisition or recent reviews, suggesting limited press coverage and external backlinks. This reduces the site's authority signals for AI crawlers.
What to change: Proactively pitch the Ezra acquisition story to tech and HR media outlets, and encourage customers to leave reviews on G2 and other platforms to build external signals.
What's working
- All major AI crawlers allowed with full HTML content — GPTBot, ClaudeBot, PerplexityBot, Google-Extended, OAI-SearchBot, ChatGPT-User, and anthropic-ai all receive a 200 with the full 192KB HTML page, identical to a browser. Only Bytespider is blocked.
- Key pages render full server-side HTML with no JS-shell risk — The homepage, AI recruiting, pricing, compare, and blog pages all return substantial visible text (500–1,000+ words) on a plain GET, ensuring AI crawlers can index the content without JavaScript execution.
- Sitemap.xml is healthy with 3,022 URLs — The sitemap lists 3,022 URLs across localized and English pages, ensuring comprehensive crawl coverage.
- AI recruiting page has rich FAQPage schema with 7 questions — The AI recruiting page includes FAQPage schema covering bias, integration, compliance, and candidate experience, which helps AI crawlers extract structured Q&A content.
- Pricing page has FAQPage schema with 4 questions — The pricing page includes FAQPage schema, aiding AI crawlers in understanding pricing-related queries.
- Homepage has SoftwareApplication schema with aggregate rating — The homepage carries SoftwareApplication schema with applicationSubCategory 'Applicant Tracking System' and an AggregateRating, providing basic structured data for AI crawlers.
- Robots.txt allows all AI bots with no blocking — The robots.txt has a single rule disallowing only a pagination parameter, meaning no AI bots are blocked. This ensures maximum crawlability.
Track greenhouse.com across AI search
This is one snapshot. Open the interactive report to inspect evidence, or grade another site free.