AI Site Grade
mlaglobal.com — AI Site Grade
MLA Global's sitemap returns 403/404, blocking AI crawlers from discovering thousands of pages of legal recruiting content.
The site lacks structured data entirely, has a broken sitemap, and suffers from zero external web presence, severely limiting AI visibility despite allowing all crawlers.
- Findings
- 12
- Evidence checks
- 29
- Completed
- 30 May 2026
Analysis
I have enough data to write a thorough audit. Let me compile the findings.
The Global Navigators of Legal Careers — But Invisible to AI Discovery
The site's sitemap, the single most critical discovery signal for AI crawlers, returns 403 Forbidden — and the gzipped sitemap referenced in robots.txt returns 404 Not Found. This means GPTBot, ClaudeBot, PerplexityBot, and every other AI crawler that hits this domain gets a warm 200 on the homepage but has no structured map of the site's content, leaving thousands of pages of thought leadership, consultant profiles, and placement data effectively undiscoverable.
Crawler Access
All major AI crawlers — GPTBot, ClaudeBot, Google-Extended, PerplexityBot, OAI-SearchBot, Bytespider, Applebot-Extended — receive HTTP 200 with full content on the homepage and all tested subpages. No UA-based blocking exists. The robots.txt is minimal: a single wildcard rule disallowing /sitecore* paths (the CMS backend) with no AI-bot-specific directives. The llms.txt endpoint returns 403 Forbidden, meaning no AI-friendly content map is published. The site runs on Sitecore CMS behind an Azure/Fastly edge (x-azure-ref, x-fd-int-roxy-purgeid headers), with strong security headers (HSTS, CSP, X-Frame-Options).
Content & Schema Posture
Every page tested — homepage, service pages, insights, placements — is entirely devoid of JSON-LD structured data. No Organization, WebSite, BreadcrumbList, FAQPage, or Article schema exists anywhere. The homepage uses a single H1 ("The Global Navigators of Legal Careers") and H2 headings for the two primary CTAs. Service pages like In-House Counsel Recruiting carry substantial content (1,400+ words) with statistics (646 GC placements, 74% stick ratio) but no schema to help AI engines extract those facts. The Insights page and Press Releases page are JavaScript-dependent search interfaces that return "There are no search result for the entered value" in plain GET — AI crawlers see empty shells with no article listings.
Cold-Knowledge Gap
The LLM prior knows MLA as "one of the largest legal search firms globally" with "offices in the U.S., Europe, Asia, and the Middle East" and is aware of the Partner Compensation Surveys and Associate Salary Surveys. The site itself claims 1,162 placements in 2024, 200+ recruiters globally, 27 locations, and over 40 years of operation. The gap: the LLM knows the brand's compensation surveys but has no structured way to cite the specific 2024 data ($1,411,000 average partner compensation, 26% increase from 2022) that lives on the info.mlaglobal.com subdomain (a HubSpot-hosted microsite with its own separate sitemap). The LLM also references "Allegis Partners" — the site says "an Allegis Group company," a subtle but real discrepancy.
External Signals
External web searches returned zero indexed results for the brand across multiple queries — no press mentions, no Reddit threads, no review sites surfaced. The DNS TXT records include an OpenAI domain verification token (openai-domain-verification=dv-WVbOmOusWibQgtktbDbl2npC), confirming the brand has proactively registered with OpenAI's crawler verification system. Social links point to LinkedIn, X, Facebook, YouTube, and Instagram. The careers.mlaglobal.com subdomain (powered by a separate platform) is the only page that carries any JSON-LD schema — a basic WebSite with SearchAction.
Surprising Findings
The copyright footer reads "© 2026" — two years ahead of the current date — suggesting a CMS configuration error or placeholder date. The sitemap URLs in robots.txt point to /_assets/mlaglobal-sitemap-en.xml.gz but the actual path resolves to /en/_assets/... after the Sitecore redirect, and the .xml extension is explicitly forbidden by the server. The info.mlaglobal.com subdomain (HubSpot) hosts the Partner Compensation Survey and other gated content but returns a 404 on its root and has a 2017 copyright in its 404 template — a fragmented brand experience across two platforms with no cross-linking schema.
Findings
Sitemap returns 403 Forbidden and 404 Not Found High
The sitemap at /_assets/mlaglobal-sitemap-en.xml.gz returns 404, and the .xml version returns 403. AI crawlers have no structured map of the site's content, leaving thousands of pages undiscoverable.
What to change: Fix the sitemap URL configuration so it returns 200 with a valid XML sitemap listing all public pages.
No JSON-LD structured data on any page High
Every tested page lacks Organization, WebSite, BreadcrumbList, Article, or any other JSON-LD schema. AI engines cannot extract facts like placement numbers or compensation data without schema.
What to change: Add JSON-LD structured data for Organization, WebSite, BreadcrumbList, and Article schema on all pages.
Insights and Press Releases pages are JavaScript-dependent search interfaces High
The Insights and Press Releases pages return empty shells with no article listings when fetched without JavaScript. AI crawlers see no content, making thought leadership invisible.
What to change: Implement server-side rendering or static HTML fallback for these pages so AI crawlers can index article listings.
llms.txt endpoint returns 403 Forbidden Medium
The llms.txt file, which provides an AI-friendly content map, is blocked. This prevents AI crawlers from discovering key resources efficiently.
What to change: Publish an llms.txt file at the root with links to key pages and resources.
Zero external web presence in search results High
Multiple web searches for the brand returned zero indexed results across news, reviews, and social platforms. This severely limits off-site signals that AI models use for authority.
What to change: Invest in PR, backlinks, and review generation to build external signals.
Copyright footer displays year 2026 Low
The footer reads '© 2026', which is two years ahead of the current date, indicating a CMS configuration error.
What to change: Update the copyright year to the current year or use dynamic date logic.
info.mlaglobal.com root returns 404 Medium
The HubSpot-hosted subdomain for gated content (e.g., Partner Compensation Survey) returns 404 on its root, creating a fragmented brand experience.
What to change: Set up a landing page or redirect at the root of info.mlaglobal.com.
Robots.txt has no AI-bot-specific rules Low
The robots.txt only has a wildcard rule disallowing /sitecore* paths. No AI crawlers are explicitly allowed or disallowed, missing an opportunity to guide crawler behavior.
What to change: Add explicit rules for AI crawlers (e.g., GPTBot, ClaudeBot) to ensure they can access key content.
No breadcrumb navigation schema Medium
The site lacks BreadcrumbList schema, which helps AI models understand page hierarchy and context.
What to change: Add BreadcrumbList JSON-LD to all pages.
Insights pages lack Article schema Medium
Even if articles were rendered, they lack Article schema, making it harder for AI to extract publication dates, authors, and summaries.
What to change: Add Article schema to all insight articles.
Service pages lack FAQ schema for common questions Low
Service pages contain substantial content but no FAQ schema, missing an opportunity to appear in rich results.
What to change: Add FAQPage schema for common questions on service pages.
Fragmented brand experience across platforms Low
The main site (Sitecore) and info subdomain (HubSpot) have no cross-linking schema, and the info subdomain has a 2017 copyright in its 404 template, creating inconsistency.
What to change: Ensure consistent branding and cross-linking between the main site and subdomains.
What's working
- All major AI crawlers receive HTTP 200 with full content — GPTBot, ClaudeBot, Google-Extended, and others are not blocked by robots.txt or server rules, allowing them to access the homepage and subpages.
- OpenAI domain verification token present — The DNS TXT records include an OpenAI domain verification token, indicating proactive registration with OpenAI's crawler system.
- Strong security headers in place — HSTS, CSP, and X-Frame-Options headers are present, providing good security posture.
- Careers subdomain has basic WebSite schema — The careers.mlaglobal.com subdomain includes a WebSite schema with SearchAction, providing some structured data for job searches.
- Service pages contain substantial, data-rich content — Pages like In-House Counsel Recruiting have over 1,400 words with statistics (646 GC placements, 74% stick ratio), providing valuable information for AI models.
- Social media links present on site — Links to LinkedIn, X, Facebook, YouTube, and Instagram are available, providing potential off-site signals.
- LLM prior knowledge includes brand and key surveys — The LLM knows MLA as a large legal search firm and is aware of its Partner Compensation Surveys, providing a baseline for AI recognition.
- No user-agent blocking for AI crawlers — The site does not block any AI crawlers via user-agent rules, ensuring access to content.
Track mlaglobal.com across AI search
This is one snapshot. Open the interactive report to inspect evidence, or grade another site free.