AI Site Grade
welcometothejungle.com — AI Site Grade
Welcome to the Jungle's Next.js SPA serves a 521KB JavaScript shell to every AI crawler, with zero visible content, zero structured data, and zero schema, despite a permissive robots.txt.
The site is entirely invisible to AI crawlers due to client-side rendering, leaking internal API keys and providing no machine-readable content, while the LLM holds detailed prior knowledge that the site fails to deliver.
- Findings
- 11
- Evidence checks
- 44
- Completed
- 30 May 2026
Analysis
I have enough data to write the audit. Let me compile the findings.
The core finding is clear: this is a major European HR tech company whose site is entirely invisible to AI crawlers despite technically allowing them — a Next.js SPA behind AWS WAF that serves a JS shell to all bots, with no server-side rendering, no structured data, and no llms.txt. The cold LLM knowledge is actually quite detailed, creating a massive gap between what models know and what the site delivers.
Here's the audit:
The site serves a JavaScript shell to every AI crawler — zero visible content, zero schema, zero structured data
Every AI bot tested (GPTBot, ClaudeBot, PerplexityBot, Google-Extended, OAI-SearchBot, Bytespider, Applebot-Extended, anthropic-ai, ChatGPT-User) receives a 521KB Next.js app shell from www.welcometothejungle.com containing only CSS bundle links and a <script> block with environment variables. The actual page content — job listings, company profiles, articles — is rendered entirely client-side and never reaches the crawler's HTML parser. The non-www apex domain returns a 2KB AWS WAF JS challenge to browser UAs, while AI bots get redirected to the www subdomain where they hit the empty shell.
Crawler Access
The robots.txt at www.welcometothejungle.com/robots.txt (accessible to AI bots) is permissive: User-agent: * with only /me/*, /settings/*, /users/*, and query-parameter URLs disallowed. No AI-specific directives exist — no GPTBot, ClaudeBot, Google-Extended, or PerplexityBot sections. The sitemap is declared at sitemaps/index.xml.gz and resolves to a valid index containing job-listing, company-pages, articles, and static-pages sub-sitemaps, but the sub-sitemaps for companies and static pages return 403 Access Denied from S3. Only the articles sitemap (7MB, ~thousands of URLs) is accessible. The llms.txt file returns the AWS WAF challenge to browsers and the JS shell to AI bots — effectively non-existent.
Content & Schema
The homepage, /en/about, and article pages all return the identical JS shell with zero JSON-LD schema, zero meta description, zero <title> tag, zero og: tags, and zero canonical link. The only heading present is <h1>JavaScript is disabled</h1> from the noscript fallback. The site leaks internal infrastructure through the window.env object exposed in every page: Algolia app ID and API keys, Amplitude API key, GrowthBook client/decryption keys, Google OAuth client ID, and internal service hostnames (expansion_global_host, eba_api_host, etc.).
Cold-Knowledge Gap
The LLM knows Welcome to the Jungle in detail: founded 2015 by Jeremy Cledat and Guillaume Luccisano, French HR tech platform, employer branding focus, acquired Hunted (UK) in 2020, raised €50M Series C in 2023, serves France/UK/Germany/Spain. None of this information exists in machine-readable form on the site. The gap between the model's prior and what the site actually exposes to crawlers is total — the site provides zero factual content, zero structured data, zero brand positioning signals that an AI engine could extract.
External Signals
DuckDuckGo returns zero search results for welcometothejungle.com or any query combining the brand name with "recruitment", "job board", or "employer branding". The Wayback Machine snapshot (July 2025) also captures only the JS shell. DNS records show AWS CloudFront + Route53, Google Workspace mail, and multiple Google/Brevo/Apple verification TXT records. The site has no discoverable external press mentions, reviews, or Reddit threads in search results — a striking absence for a company that raised €50M.
Findings
All AI crawlers receive a JavaScript shell with no visible content High
Every AI bot tested (GPTBot, ClaudeBot, PerplexityBot, Google-Extended, OAI-SearchBot, Bytespider, Applebot-Extended, anthropic-ai, ChatGPT-User) receives a 521KB Next.js app shell containing only CSS bundle links and a script block with environment variables. Actual page content is rendered client-side and never reaches the crawler's HTML parser.
What to change: Implement server-side rendering (SSR) or static site generation (SSG) for all pages, or use dynamic rendering to serve pre-rendered HTML to AI crawlers.
Zero JSON-LD schema, meta tags, or canonical links on any page High
The homepage, /en/about, and article pages all return the identical JS shell with zero JSON-LD schema, zero meta description, zero <title> tag, zero og: tags, and zero canonical link. The only heading is <h1>JavaScript is disabled</h1> from the noscript fallback.
What to change: Add JSON-LD structured data (Organization, JobPosting, Article, FAQPage, BreadcrumbList) and standard meta tags to all pages.
llms.txt file returns AWS WAF challenge or JS shell, not usable content High
The llms.txt file at welcometothejungle.com/llms.txt returns the AWS WAF JS challenge to browser UAs and the JS shell to AI bots, making it effectively non-existent for AI crawlers.
What to change: Create a proper llms.txt file with a summary of the site, key URLs, and guidance for AI crawlers.
Company and static pages sub-sitemaps return 403 Access Denied High
The sitemap index at sitemaps/index.xml.gz is valid, but the sub-sitemaps for company-pages and static-pages return 403 Access Denied from S3. Only the articles sitemap (7MB) is accessible.
What to change: Ensure all sub-sitemaps are publicly accessible and return 200 OK.
Internal API keys and service hostnames leaked in client-side environment variables High
The window.env object exposed in every page contains Algolia app ID and API keys, Amplitude API key, GrowthBook client/decryption keys, Google OAuth client ID, and internal service hostnames (expansion_global_host, eba_api_host, etc.).
What to change: Remove sensitive keys from client-side code; use server-side environment variables and proxy API calls through a backend.
Robots.txt lacks AI-specific directives for GPTBot, ClaudeBot, Google-Extended, etc. Medium
The robots.txt at www.welcometothejungle.com/robots.txt is permissive with User-agent: * and only disallows /me/*, /settings/*, /users/*, and query-parameter URLs. No AI-specific sections exist, leaving crawlers without guidance on rate limits or allowed paths.
What to change: Add explicit sections for AI crawlers (GPTBot, ClaudeBot, Google-Extended, PerplexityBot) with appropriate crawl-delay and disallow rules.
No search results found for the brand or site on DuckDuckGo Medium
DuckDuckGo returns zero search results for welcometothejungle.com or any query combining the brand name with 'recruitment', 'job board', or 'employer branding'. No discoverable external press mentions, reviews, or Reddit threads.
What to change: Improve SEO fundamentals (SSR, meta tags, structured data) to get pages indexed, and build external backlinks through PR and content marketing.
Detailed LLM prior knowledge is completely absent from the site's machine-readable content High
The LLM knows Welcome to the Jungle in detail (founded 2015, €50M Series C, acquired Hunted, etc.), but none of this information exists in machine-readable form on the site. The gap between the model's prior and what the site exposes to crawlers is total.
What to change: Add structured data (Organization schema) and a comprehensive llms.txt to provide factual brand information to AI crawlers.
Non-www apex domain returns AWS WAF JS challenge to browsers Medium
The non-www apex domain (welcometothejungle.com) returns a 2KB AWS WAF JS challenge to browser UAs, while AI bots get redirected to the www subdomain where they hit the empty shell.
What to change: Redirect the apex domain to the www subdomain with a 301 redirect for all UAs, or serve the same content as the www subdomain.
No canonical links or hreflang tags on any page Medium
Pages lack canonical links and hreflang tags, which can cause duplicate content issues for the multilingual site (en/fr).
What to change: Add canonical links and hreflang tags to all pages to indicate the preferred version and language/region.
Wayback Machine snapshot captures only the JavaScript shell Low
The Wayback Machine snapshot from July 2025 also captures only the JS shell, meaning the site has been invisible to archival crawlers as well.
What to change: Implement SSR or dynamic rendering to ensure content is captured by archival services.
What's working
- Robots.txt is permissive and allows all AI crawlers — The robots.txt at www.welcometothejungle.com/robots.txt allows all user agents with only minimal disallows, and the sitemap is declared. This is a good foundation for crawler access.
- Articles sitemap is accessible and contains thousands of URLs — The articles sub-sitemap (7MB) returns 200 OK and contains thousands of article URLs, providing a path for crawlers to discover content if rendering were fixed.
- LLM holds detailed prior knowledge about the company — The LLM knows Welcome to the Jungle's founding, funding, acquisitions, and market presence, which can be leveraged once the site provides machine-readable content.
- All tested AI bots receive HTTP 200 responses — Every AI bot tested receives a 200 OK response, meaning the site does not block or rate-limit crawlers at the HTTP level.
- Sitemap index is valid and points to multiple sub-sitemaps — The sitemap index at sitemaps/index.xml.gz is valid and contains sub-sitemaps for jobs, companies, articles, and static pages, indicating a structured approach to content organization.
- Site uses AWS CloudFront CDN for global delivery — The site is served via AWS CloudFront, which provides fast global content delivery and DDoS protection.
Track welcometothejungle.com across AI search
This is one snapshot. Open the interactive report to inspect evidence, or grade another site free.