Models

The frontier models behind AI search, and how rarely they agree on who to recommend. Every brand ChatGPT, Claude, Gemini, Perplexity and more name for the same question, compared head to head.

Updated Mar 11, 2026·44K reports · 825K prompts · 6.4M answers

Models tracked

frontier engines

Average agreement

43.3%

when two models answer the same question

Unanimous answers

4.0%

every model names the same brand

Prompts compared

825K

head to head

The models we track

Eight frontier engines, ranked by how often each one returns a brand answer. Each shows the model it lines up with most.

Meta AI

Llama 4 Maverick

95.0%

answers brand questions

Closest to

Claude23%

ChatGPT

GPT 5.5

85.4%

answers brand questions

Closest to

Claude27%

Grok

Grok 4.20

83.0%

answers brand questions

Closest to

Claude35%

Gemini

Gemini 3.5 Flash

82.2%

answers brand questions

Closest to

Claude26%

DeepSeek

DeepSeek V4

80.9%

answers brand questions

Closest to

Claude35%

Claude

Claude Opus 4.8

79.9%

answers brand questions

Closest to

DeepSeek35%

Perplexity

Perplexity Sonar

79.4%

answers brand questions

Closest to

AI Overviews17%

AI Overviews

56.5%

answers brand questions

Closest to

ChatGPT20%

Who agrees with whom

Every pair of models, by how often they name the same brand for a question. Greener means they agree more.

LessMore

ChatGPT

Claude

Gemini

Perplexity

DeepSeek

Grok

Meta AI

AI Overviews

Hover a cell to read a pair, or pick a focus model. Numbers are the share of questions where both name the same brand.

Agree most

Claude+

DeepSeek35%

Agree least

Perplexity+

Meta AI10%

Across all pairs

20% average agreement

Claude sits closest to the pack; Perplexity is the biggest outlier. Win there and you reach an audience the others miss.

How often do they line up

The spread of agreement across every comparison. Most questions land in the middle; near-unanimity is rare.

0-25%

14.6%117K

25-50%

45.1%360K

50-75%

28.0%223K

75-99%

8.3%66K

100%

4.0%32K

Agreement is the share of a question's brand picks the two models share. The long middle is the real story: the models mostly half-overlap, so where you rank depends on which engine a buyer asks.

Where they split

Cross-model agreement by the kind of question asked. Tight, comparative questions converge; open ones scatter.

Question type

Agreement

Volume

Comparison

50.4%

34K

How-to

45.3%

21K

Alternatives

44.1%

7.6K

Best-of

43.4%

375K

Recommendation

43.1%

43K

General

42.2%

317K

See it in the wild

Real questions from the study - where the models split on who to recommend, and the rare ones where they all agree.

“best project management software for a fast-growing remote team”

13%agree

ChatGPTAsana

ClaudeClickUp

GeminiMonday.com

PerplexityLinear

DeepSeekTrello

GrokNotion

Meta AIWrike

AI OverviewsJira

“which CRM should a small B2B sales team use”

13%agree

ChatGPTHubSpot

ClaudePipedrive

GeminiZoho CRM

PerplexityFreshsales

DeepSeekFolk

GrokSalesforce

Meta AIClose

AI OverviewsCopper

“compare accounting tools for freelancers and sole traders”

13%agree

ChatGPTQuickBooks

ClaudeFreshBooks

GeminiXero

PerplexitySage

DeepSeekBonsai

GrokWave

Meta AIZoho Books

AI OverviewsFreeAgent

“good email marketing platform for an online store”

13%agree

ChatGPTKlaviyo

ClaudeMailchimp

GeminiBrevo

PerplexityConvertKit

DeepSeekActiveCampaign

GrokOmnisend

Meta AIDrip

AI OverviewsMailerLite

“best password manager for a small business”

13%agree

ChatGPT1Password

ClaudeBitwarden

GeminiDashlane

PerplexityRoboForm

DeepSeekKeeper

GrokNordPass

Meta AILastPass

AI OverviewsProton Pass

“what is a good website builder for a portfolio site”

13%agree

ChatGPTSquarespace

ClaudeWebflow

GeminiWix

PerplexityFormat

DeepSeekWordPress

GrokFramer

Meta AICarrd

AI OverviewsPixpa

6 of 15

Methodology

Built by giving the same prompts to every model and comparing the brands each one returns. 599K high-quality comparisons across 44K reports, Aug 12, 2025 to Mar 11, 2026. Agreement is the overlap of brand picks between two models; appearance is how often a model returns a usable brand answer.

Go deeper

The full model divergence study The State of AI Search

Trakkr model divergence study·CC BY 4.0

Common questions

Do AI models recommend the same brands?

Rarely all at once. When the same prompt is put to ChatGPT, Claude, Gemini, Perplexity and the others, full agreement on the recommended brands is the exception - most queries produce meaningfully different shortlists across models.

Which AI models does Trakkr track?

Eight: ChatGPT, Claude, Gemini, Perplexity, Google AI Overviews, Grok, DeepSeek and Meta AI.

Why do AI models disagree on recommendations?

They are trained on different data, retrieve from different sources, and weight authority differently. This dataset quantifies the disagreement with a pairwise agreement matrix and divergence by query type.