Trakkr Data

Models

The frontier models behind AI search, and how rarely they agree on who to recommend. Every brand ChatGPT, Claude, Gemini, Perplexity and more name for the same question, compared head to head.

Updated Mar 11, 2026·44K reports · 825K prompts · 6.4M answers
Models tracked
8
frontier engines
Average agreement
43.3%
when two models answer the same question
Unanimous answers
4.0%
every model names the same brand
Prompts compared
825K
head to head

The models we track

Eight frontier engines, ranked by how often each one returns a brand answer. Each shows the model it lines up with most.

Meta AI
Llama 4 Maverick
95.0%
answers brand questions
Closest toClaude23%
ChatGPT
GPT 5.5
85.4%
answers brand questions
Closest toClaude27%
Grok
Grok 4.20
83.0%
answers brand questions
Closest toClaude35%
Gemini
Gemini 3.5 Flash
82.2%
answers brand questions
Closest toClaude26%
DeepSeek
DeepSeek V4
80.9%
answers brand questions
Closest toClaude35%
Claude
Claude Opus 4.8
79.9%
answers brand questions
Closest toDeepSeek35%
Perplexity
Perplexity Sonar
79.4%
answers brand questions
Closest toAI Overviews17%
AI Overviews
Search
56.5%
answers brand questions
Closest toChatGPT20%

Who agrees with whom

Every pair of models, by how often they name the same brand for a question. Greener means they agree more.

LessMore
ChatGPT
Claude
Gemini
Perplexity
DeepSeek
Grok
Meta AI
AI Overviews
Hover a cell to read a pair, or pick a focus model. Numbers are the share of questions where both name the same brand.
Agree most
Claude+DeepSeek35%
Agree least
Perplexity+Meta AI10%
Across all pairs
20% average agreement

Claude sits closest to the pack; Perplexity is the biggest outlier. Win there and you reach an audience the others miss.

How often do they line up

The spread of agreement across every comparison. Most questions land in the middle; near-unanimity is rare.

0–25%
14.6%
25–50%
45.1%
50–75%
28.0%
75–99%
8.3%
100%
4.0%

Agreement is the share of a question's brand picks the two models share. The long middle is the real story: the models mostly half-overlap, so where you rank depends on which engine a buyer asks.

Where they split

Cross-model agreement by the kind of question asked. Tight, comparative questions converge; open ones scatter.

Comparison
50.4%
34K
How-to
45.3%
21K
Alternatives
44.1%
7.6K
Best-of
43.4%
375K
Recommendation
43.1%
43K
General
42.2%
317K

See it in the wild

Real questions from the study — where the models split on who to recommend, and the rare ones where they all agree.

“best project management software for a fast-growing remote team”

13%agree
ChatGPTAsanaClaudeClickUpGeminiMonday.comPerplexityLinearDeepSeekTrelloGrokNotionMeta AIWrikeAI OverviewsJira

“which CRM should a small B2B sales team use”

13%agree
ChatGPTHubSpotClaudePipedriveGeminiZoho CRMPerplexityFreshsalesDeepSeekFolkGrokSalesforceMeta AICloseAI OverviewsCopper

“compare accounting tools for freelancers and sole traders”

13%agree
ChatGPTQuickBooksClaudeFreshBooksGeminiXeroPerplexitySageDeepSeekBonsaiGrokWaveMeta AIZoho BooksAI OverviewsFreeAgent

“good email marketing platform for an online store”

13%agree
ChatGPTKlaviyoClaudeMailchimpGeminiBrevoPerplexityConvertKitDeepSeekActiveCampaignGrokOmnisendMeta AIDripAI OverviewsMailerLite

“best password manager for a small business”

13%agree
ChatGPT1PasswordClaudeBitwardenGeminiDashlanePerplexityRoboFormDeepSeekKeeperGrokNordPassMeta AILastPassAI OverviewsProton Pass

“what is a good website builder for a portfolio site”

13%agree
ChatGPTSquarespaceClaudeWebflowGeminiWixPerplexityFormatDeepSeekWordPressGrokFramerMeta AICarrdAI OverviewsPixpa
6 of 15
Methodology

Built by giving the same prompts to every model and comparing the brands each one returns. 599K high-quality comparisons across 44K reports, Aug 12, 2025 to Mar 11, 2026. Agreement is the overlap of brand picks between two models; appearance is how often a model returns a usable brand answer.

Trakkr model divergence study·CC BY 4.0

Common questions

Do AI models recommend the same brands?

Rarely all at once. When the same prompt is put to ChatGPT, Claude, Gemini, Perplexity and the others, full agreement on the recommended brands is the exception — most queries produce meaningfully different shortlists across models.

Which AI models does Trakkr track?

Eight: ChatGPT, Claude, Gemini, Perplexity, Google AI Overviews, Grok, DeepSeek and Meta AI.

Why do AI models disagree on recommendations?

They are trained on different data, retrieve from different sources, and weight authority differently. This dataset quantifies the disagreement with a pairwise agreement matrix and divergence by query type.