AI Site Grade

myfitnesspal.com — AI Site Grade

MyFitnessPal's blog — its richest, most AI-valuable content — is completely invisible to every major AI crawler, blocked at the Cloudflare edge with a 403 challenge page, despite a robots.txt that explicitly invites them in.

MyFitnessPal's blog is entirely blocked to AI crawlers by Cloudflare, the main site lacks any schema markup, and the sitemap omits the blog and community, severely limiting AI visibility.

Findings
9
Evidence checks
31
Completed
30 May 2026

Analysis

MyFitnessPal's blog — its richest, most AI-valuable content — is completely invisible to every major AI crawler, blocked at the Cloudflare edge with a 403 challenge page, despite a robots.txt that explicitly invites them in.

Crawler Access Split

The main domain (www.myfitnesspal.com) passes 9 of 11 AI bots tested, including GPTBot, Google-Extended, PerplexityBot, OAI-SearchBot, and ChatGPT-User. ClaudeBot gets a 403 (Cloudflare "Blocked" page) while anthropic-ai passes — an inconsistent Anthropic-bot posture. The blog subdomain (blog.myfitnesspal.com) is a different story: every single AI bot tested returns 403 — GPTBot, ClaudeBot, Google-Extended, PerplexityBot, OAI-SearchBot, ChatGPT-User, Applebot-Extended, Bytespider, and anthropic-ai all blocked. The blog's robots.txt explicitly allows all of them with Allow: /, but Cloudflare's WAF overrides the directive at the edge. The blog contains 1,500+ words of rich, structured nutrition content, recipes, and expert articles — all invisible to LLM training pipelines and AI search engines.

Cold-Knowledge Gap

The LLM knows MyFitnessPal as a "calorie tracking and nutrition logging app" with "over 200 million registered users," recalls the Under Armour acquisition ($475M in 2015) and Francisco Partners sale (2020), and flags the 2018 data breach (150 million accounts) and user frustration with premium upsells. The site itself never mentions the data breach, the acquisition history, or the user count (200M+) on the homepage — it leads with "3.5 Million 5-Star Ratings" and "over 20 million global foods." The cold model's knowledge is more historically accurate than the homepage's self-presentation, which omits the brand's scale and its well-documented security incident entirely.

Schema Posture

The main site pages — homepage, premium, BMR calculator, contact-us — contain zero JSON-LD schema of any type. No Organization, WebSite, FAQPage, Product, or SoftwareApplication markup. The blog subdomain is the only property with structured data: WebPage, WebSite, Organization, BreadcrumbList, and SearchAction schema. The blog's Organization schema includes sameAs links to Facebook, X (Twitter), TikTok, and Instagram. The main domain's complete lack of schema means AI crawlers that do get through (GPTBot, Google-Extended, etc.) receive no entity-level signals about what MyFitnessPal is, who owns it, or how it relates to competitors.

Sitemap and Content Architecture

The sitemap contains only 7 URLs — homepage, premium, exercise lookup, apps, mobile/iphone, mobile/android, and contact-us. No blog posts, no community pages, no tool pages, no recipe pages. The blog, community forum (community.myfitnesspal.com), and support center (support.myfitnesspal.com) are all on separate subdomains excluded from the main sitemap. The blog has its own WordPress sitemap but it is unreachable by AI crawlers due to the Cloudflare block. The exercise/lookup page returns zero visible text (JS-rendered shell), and the support.myfitnesspal.com returns 403 even to a browser UA. The llms.txt returns a 404 with a copyright footer dating to "2006-16" — a stale artifact suggesting the site has not been reviewed for AI discoverability.

Findings

  1. Blog completely blocked to all major AI crawlers by Cloudflare WAF High

    The blog subdomain (blog.myfitnesspal.com) returns 403 for every AI bot tested (GPTBot, ClaudeBot, Google-Extended, PerplexityBot, OAI-SearchBot, ChatGPT-User, Applebot-Extended, Bytespider, anthropic-ai) despite robots.txt allowing them. Cloudflare's WAF overrides the directive, making 1,500+ words of rich nutrition content invisible to LLMs and AI search engines.

    What to change: Update Cloudflare WAF rules to allow AI crawlers listed in robots.txt, or serve the blog from a path on the main domain to bypass subdomain-level blocks.

  2. Main site pages lack any JSON-LD schema markup High

    The homepage, premium page, BMR calculator, and contact-us page contain zero JSON-LD schema. No Organization, WebSite, FAQPage, Product, or SoftwareApplication markup exists, so AI crawlers receive no entity-level signals about MyFitnessPal's identity, offerings, or relationships.

    What to change: Add JSON-LD schema (Organization, WebSite, SoftwareApplication, FAQPage) to all main site pages to provide structured entity data to AI crawlers.

  3. Sitemap contains only 7 URLs, omitting blog and community High

    The main sitemap lists only 7 URLs (homepage, premium, exercise lookup, apps, mobile/iphone, mobile/android, contact-us). The blog, community forum, and support center are on separate subdomains and not included. The blog's own sitemap is unreachable by AI crawlers due to the Cloudflare block.

    What to change: Include blog and community URLs in the main sitemap, or add a sitemap index that references subdomain sitemaps. Ensure blog sitemap is accessible to AI crawlers.

  4. ClaudeBot blocked on main domain while anthropic-ai passes Medium

    On the main domain, ClaudeBot receives a 403 (Cloudflare block) while anthropic-ai is allowed, creating an inconsistent Anthropic-bot posture. This may confuse crawler management and reduce visibility to Claude-based AI systems.

    What to change: Allow ClaudeBot in Cloudflare WAF rules to match the anthropic-ai allowance, ensuring consistent access for Anthropic crawlers.

  5. Exercise lookup page returns zero visible text Medium

    The /exercise/lookup page returns 200 but contains 0 words of visible text, indicating a JavaScript-rendered shell. AI crawlers that do not execute JS see an empty page, making exercise data inaccessible.

    What to change: Server-side render the exercise lookup content or provide a static HTML fallback for crawlers.

  6. Support subdomain returns 403 even to browser UAs Medium

    The support subdomain (support.myfitnesspal.com) returns a 403 'Just a moment...' page, blocking all access including AI crawlers and likely human visitors. This prevents AI systems from indexing help content.

    What to change: Investigate and resolve the 403 on the support subdomain to allow access for both users and crawlers.

  7. llms.txt file returns 404 with stale copyright Medium

    The llms.txt file at /llms.txt returns a 404, and the error page shows a copyright footer dating to '2006-16', indicating the site has not been reviewed for AI discoverability. No AI-specific guidance file exists.

    What to change: Create an llms.txt file with a summary of the site's content and pointers to key pages for AI crawlers.

  8. Homepage omits brand scale and data breach history Low

    The homepage highlights '3.5 Million 5-Star Ratings' and 'over 20 million global foods' but does not mention the 200 million registered users or the 2018 data breach. The LLM's cold knowledge is more historically accurate, creating a gap between self-presentation and public record.

    What to change: Consider adding user count and a brief security note to the homepage to align with public knowledge and improve trust signals.

  9. Community forum page returns zero visible text Medium

    The community forum (community.myfitnesspal.com/en/categories) returns 200 but 0 words of visible text, likely a JS-rendered shell. AI crawlers see an empty page, making community discussions invisible.

    What to change: Server-side render the community categories or provide a static HTML fallback for crawlers.

What's working

  • Main domain allows 9 of 11 AI bots tested — The main domain (www.myfitnesspal.com) passes GPTBot, Google-Extended, PerplexityBot, OAI-SearchBot, ChatGPT-User, and others, providing broad AI crawler access to homepage and key pages.
  • Blog robots.txt explicitly allows all major AI bots — The blog's robots.txt contains Allow directives for GPTBot, ClaudeBot, Google-Extended, PerplexityBot, OAI-SearchBot, and others, indicating intent to welcome AI crawlers.
  • Blog includes WebPage, WebSite, Organization, BreadcrumbList, and SearchAction schema — The blog subdomain contains JSON-LD schema for WebPage, WebSite, Organization, BreadcrumbList, and SearchAction, providing structured entity data to crawlers that can access it.
  • Blog Organization schema includes sameAs links to social profiles — The blog's Organization schema includes sameAs URLs for Facebook, X (Twitter), TikTok, and Instagram, helping AI systems connect the brand to its social presence.
  • Blog contains 1,500+ words of rich nutrition and recipe content — The blog homepage has over 1,500 words of structured nutrition content, recipes, and expert articles, which would be highly valuable for AI training and search if accessible.
  • Main domain robots.txt allows all major AI bots — The main domain's robots.txt has Allow rules for GPTBot, ClaudeBot, Google-Extended, PerplexityBot, OAI-SearchBot, and others, showing intent to permit AI crawlers.

Track myfitnesspal.com across AI search

This is one snapshot. Open the interactive report to inspect evidence, or grade another site free.

Open this AI Site Grade Grade another site Track your brand