Does ChatGPT-User mean the same thing as OAI-SearchBot?

No. OAI-SearchBot is OpenAI search crawling. ChatGPT-User is a user-triggered live retrieval fetcher. This study reports them separately.

Does the baseline prove AI crawlers prefer Markdown?

Not yet. OAI-SearchBot shows a small directional Markdown edge on best-for pages, but it is not statistically significant in the baseline window.

What is the primary outcome in this experiment?

The primary outcome is page-level coverage by assigned variant: the share of randomized Markdown or HTML pages each crawler reached at least once.

Research

Study 010

Do AI Crawlers Prefer Markdown?

Name: Do AI Crawlers Prefer Markdown? dataset
Creator: Trakkr
Published: 2026-05-06
Keywords: AI crawlers, Markdown experiment, OAI-SearchBot, ChatGPT retrieval

Every few weeks someone posts with total confidence that AI crawlers love Markdown, or that they ignore it entirely. Nobody seems to run the test. So we did: same URL, two formats, five crawlers. They don't agree with each other.

8,035

Pages randomized

56K

Experiment events

192K

Live retrieval fetches

77.8%

Smaller payload

Last updated · May 21, 2026

Half the GEO industry will tell you AI crawlers love Markdown. The other half says it makes no difference. Almost nobody has the data to back either claim. We built the test that people keep talking about but nobody seems to actually run: every public page on trakkr.ai gets randomly served as either Markdown or HTML, that assignment stays locked to the URL, and we track what five different AI crawlers do with each version.

Short version: the two crawlers that have clearly picked a side are both training scrapers (GPTBot and ClaudeBot), and both lean toward HTML. But training scrapers feed future models; they're not what decides whether you get cited when you're talking to ChatGPT today. The crawlers that actually drive live citations haven't picked a side. Worth knowing, but not something I'd act on yet.

What I find more interesting is actually underneath the headline results. Once you strip the nav, scripts, and tracking pixels out of a typical HTML page, what's left is just the answer. The Markdown version is about a fifth of the size. Cheaper to fetch, faster to parse. If retrieval cost ever becomes a routing signal inside the AI labs (and I'd bet it already is in at least one of them), that's the thing to watch. Numbers below refresh every Monday.

[01]

The experiment

/ Same URL, two formats, randomly assigned

Design

Same page. Two formats. Locked to the URL.

Every public page on trakkr.ai gets one of two surfaces (Markdown or HTML), based on a hash of its URL. Same content, same canonical, different format. The crawlers don't know they're in a test; they just see a page. And because the assignment is locked to the URL, the same page always serves the same surface. When a crawler comes back, we know which arm it's on. That's what makes the comparison clean.

01The pages

0,035

pages in the test

Every public trakkr.ai URL enters once, then stays put for the rest of the experiment.

02The split

50 / 50

random but stable

4,551markdownhtml4,484

03The data

crawler fetches measured

Concentrated across 0 recommendation pages, our most heavily crawled URLs.

Which pages count

Every public, indexable trakkr.ai page with a stable URL. Login-walled routes, redirects, and preview pages are left out.

How pages get assigned

Each URL is hashed with the experiment ID, and the hash decides Markdown or HTML. Same URL, same surface, every fetch; a crawler can't end up seeing both versions. Observed split: 50.4% Markdown, the rest HTML.

What 'recommendation pages' means

Our pages at /ai-recommends/<product>/<audience> (think “best AI transcription for nonprofits”). There are thousands of them, which is why this slice has enough volume to detect small effects. The results scoreboard runs against this slice.

[02]

Three OpenAI bots, three different jobs

/ Why mixing them up gives misleading answers

Three crawlers, three jobs

OpenAI runs three crawlers. Most write-ups blur them into one.

They shouldn't. Each one does a different job, runs on a different schedule, and responds to format differently. If you count GPTBot fetches as evidence of live citations, or read ChatGPT-User numbers as proof of search indexing, you're measuring the wrong thing. We keep all three separate from here.

01Search

OAI-SearchBot

Search indexing crawler

Crawls steadily, like a search engine

Whether a page is being fetched for OpenAI's search index and citation-surfacing systems. Closest OpenAI signal for future ChatGPT Search discoverability.

02Interaction

ChatGPT-User

Live retrieval fetcher

Spikes when users ask questions

Whether ChatGPT opened a public page during a user-triggered retrieval action. A conversation-time demand signal, not an automatic search crawler.

03Training

GPTBot

Training crawler

Comes in heavy bursts on a schedule

Whether OpenAI's training crawler requested the page. Useful for crawl behavior, but not evidence that the page was retrieved for a live citation.

[03]

The results so far

/ Two settled signals, three still moving

Baseline results · recommendation pages

Two settled signals, three still moving.

On our recommendation pages, 0 of the 5 crawlers now show a clear Markdown-vs-HTML preference, and both of them lean toward HTML. 1 leans the other way, toward Markdown, but not by enough to call yet. The other 2 sit flat, reaching both formats at about the same rate. Click any row to see the numbers behind it.

ChatGPT-User is directionally Markdown-skewed on best-for pages.

Reached Markdown pages69.0%(772 of 1,119)Reached HTML pages68.3%(772 of 1,130)Significancep=0.731

The search crawler is slightly more likely to cover Markdown-assigned best-for pages so far, but the difference is not strong enough to call.

Reached Markdown pages73.9%(827 of 1,119)Reached HTML pages71.8%(811 of 1,130)Significancep=0.255

PerplexityBot is directionally HTML-skewed on best-for pages.

Reached Markdown pages13.3%(149 of 1,119)Reached HTML pages13.6%(154 of 1,130)Significancep=0.828

ClaudeBot is directionally HTML-skewed on best-for pages.

Reached Markdown pages16.9%(189 of 1,119)Reached HTML pages22.3%(252 of 1,130)Significancep=0.001

GPTBot is behaving unlike the search crawler; the study reports it as a training-crawler anomaly rather than a live citation signal.

Reached Markdown pages4.9%(55 of 1,119)Reached HTML pages64.0%(723 of 1,130)Significancep=<0.001

← Prefers HTMLPrefers Markdown →

How to read this

The two settled results on the board both belong to training crawlers, and both lean HTML: GPTBot hard (it reached barely any Markdown pages at all) and ClaudeBot more mildly. These are the scrapers that feed future model training, not the ones that decide whether your page gets cited when someone talks to ChatGPT today. Interesting, but not something I'd change my site over.

The result I'd actually act on is OAI-SearchBot's. That's the crawler behind ChatGPT Search; when ChatGPT goes looking for fresh information on the open web, this is what it sends. It leans Markdown by a few percentage points right now, but not by enough to be statistically confident yet. The weekly tracker further down is where we'll see if that changes.

The crawlers that actually shape live answers (ChatGPT-User and PerplexityBot) sit flat. Markdown and HTML get reached at about the same rate. Which makes sense if you think about it: at conversation time these systems are chasing the user's question, not the page's format. That flat line is the real result here. The bots that avoid Markdown are the ones that don't decide citations; the ones that do decide citations don't care how you serve the page.

[04]

What people actually ask about

/ Live retrieval follows the question, not the format

ChatGPT-User · experiment window

Live retrieval follows the question, not the format.

ChatGPT-User isn't a crawler on a schedule. It opens a page mid-conversation because someone asked ChatGPT something and the model needed a real page to answer. Across 195K of these live fetches on our site, demand tracks the topics people are actually asking about, and it spreads roughly evenly across both arms of the experiment. Format doesn't seem to matter here. The question does.

Live fetches

Unique pages

0,603

What ChatGPT pulled the most

by request volume

0,864

0,256

0,938

0,925

0,766

0,742

0,629

0,045

Sanity check: ChatGPT-User reached 83.6% of Markdown-assigned pages and 83.5% of HTML-assigned pages. That's within 0.1% of each other (p=0.946). So the category gaps above are about what people asked, not which arm of the experiment got more traffic.

[05]

Why Markdown might pull ahead anyway

/ The structural reason this line could keep moving

Markdown is much smaller

0.0%

smaller than the same page in HTML

OAI-SearchBot's Markdown lean is small for now. But here's why I think that line could keep growing: strip the nav, scripts, tracking pixels, and CSS chrome from a typical HTML page and what's left is just the answer. The Markdown version of that same page is about a fifth of the size. Cheaper to fetch, faster to parse. If retrieval cost ever becomes a routing signal inside the labs (and I'd bet it already is in at least one), that gap starts to matter.

Fetches we served as Markdown

0,289

across the randomized URLs

Fetches we served as HTML

0,917

across the randomized URLs

Total bytes saved

0.0%

vs serving everything as HTML

[06]

What to watch each week

/ The page rebuilds every Monday at 09:00 UTC

How this updates

The whole page rebuilds every Monday.

Every Monday at 09:00 UTC, the previous week's data lands and this whole page refreshes. Numbers, prose, and analysis are all pinned to the same snapshot. If you cite something here, it's tied to that week's run. The line I'm watching closest is OAI-SearchBot's Markdown lean. If it crosses into statistically significant territory, the rail below is where you'll see it happen.

Latest run

May 21, 2026

09:00 UTC

Next run

May 28, 2026

09:00 UTC

OAI-SearchBot Markdown preference · week by week

1 run so far · 11 to come

Automated weekly snapshot.

Updated Mondays

[07]

How this study works

/ Methodology and what it can't yet say

Methodology

How this is built, and what it can’t yet say.

Every Monday's run produces one snapshot that the whole page is built from. If the prose ever says something the data doesn't back, the snapshot is what's true. Below is the short version of how it works, what it can't tell you yet, and any caveats from this week.

Cadence

Mondays at 09:00 UTC

Pages

9,035

Split

50% Markdown

How it’s built

01
Eligible pages are assigned to Markdown or HTML with a deterministic hash of experiment ID and canonical path.
02
The primary endpoint is page-level coverage: the share of assigned pages each bot reached at least once.
03
Request counts, revisits, and transferred bytes are secondary outcomes because a small number of popular URLs can dominate raw volume; variant comparisons use two-proportion z-tests and are reported as percentage-point differences with p-values.
04
ChatGPT-User is separated from OAI-SearchBot because it represents user-triggered live retrieval rather than search indexing.

What it can’t say

01
The baseline event table currently reports all bot identity as user-agent-only rather than Cloudflare verified-bot enriched.
02
The first window did not serve ChatGPT-User as its own experiment cohort; it is shown as historical live-retrieval demand until the public launch.
03
The experiment measures crawl and retrieval behavior, not whether a downstream AI answer ultimately cited the page.
04
Legacy recommendation URLs are aggregated as public retrieval demand and are not presented as active experiment treatment pages.

C01
Bot identity in this baseline window is user-agent based; Cloudflare verified-bot enrichment has not yet appeared in the experiment rows.
C02
ChatGPT-User was measured as historical live-retrieval demand in the baseline and becomes a first-class interaction cohort from 2026-05-06T09:00:00Z.
C03
Coverage is page-level and randomized by canonical path, so raw request volume is reported as a secondary signal.

ReferencesFurther reading

OpenAI crawler docs

GPTBot · ChatGPT-User · OAI-SearchBot

Crawler behavior study

How AI bots discover and traverse pages

More Trakkr Research

Studies 001-010

Last snapshot · May 21, 2026, 1:14 PM GMT+1Study 010 · Trakkr Research