Research

Different Models, Different Favorites

ChatGPT, Gemini, and Claude don't recommend the same brands. Academic studies across 567K+ samples show systematic model-level preferences - and alignment training makes it worse.

Mack Grenfell

Founder

March 28, 2026

10 min read

The Science Behind AI VisibilityPart 2 of 4

Here's a question that doesn't get asked enough: when you optimize your brand for "AI visibility," which AI are you optimizing for?

The implicit assumption in most discussions about AI search is that the models are roughly interchangeable. That if ChatGPT recommends you, Gemini probably does too. That "AI visibility" is one thing, not eight different things.

The academic evidence says otherwise. And the numbers aren't even close.

The 567K experiment

In early 2025, a team of researchers ran one of the largest studies on LLM product preferences to date. They generated 567,000 product recommendation samples across five major language models, all answering the same investment-related product questions.

Exposing Product Bias in LLM Investment Recommendation

Different LLMs recommend distinct products with remarkably low overlap. 567K samples across models confirm systematic, model-specific preferences that persist across prompt variations.

Xi'an Jiaotong University, 2025

The headline finding: different models recommend different products. Not slightly different - substantially different, with low overlap between their preferred choices. Each model has its own set of "favorites" that it returns to consistently, and these favorites differ meaningfully from model to model.

Put differently: a brand that's highly visible on ChatGPT might be nearly invisible on Claude, and vice versa. Not because of anything the brand did differently, but because of how each model's training and alignment shaped its preferences.

43.9%

average agreement rate across 8 major AI models on brand rankings. Perfect consensus on which brand to recommend first occurs only 4% of the time.

Trakkr Model Divergence Report, 2026

We've seen the same pattern in our own data. Across 920K model comparisons, the average agreement rate is 43.9%. Full consensus - all models recommending the same brand - happens only 4% of the time. That's not noise. That's a structural feature of the landscape.

Why RLHF makes it worse

To understand why models diverge, you need to understand RLHF - reinforcement learning from human feedback. It's the process that makes models "helpful" and "safe." But it also narrows their recommendation pool in ways that have measurable consequences for brand visibility.

Diverse Preference Learning for Capabilities and Alignment

RLHF/DPO alignment reduces output diversity, causing models to overweight majority preferences. This narrows the pool of recommended brands and reinforces incumbents.

MIT CSAIL / ICLR 2025

The mechanism is intuitive once you see it. During RLHF, human raters label which responses are "better." These raters tend to prefer responses that mention well-known brands - because those answers feel more authoritative and trustworthy. The model learns this preference and amplifies it. Over time, dominant brands become more dominant, and the long tail of niche alternatives gets compressed.

The crucial insight: each model company uses different RLHF datasets, different rater pools, and different alignment objectives. OpenAI's alignment process produces different brand preferences than Anthropic's, which produces different preferences than Google's. The alignment step is where model-specific biases get baked in.

The pro-AI bias

There's one bias that cuts across all models: they disproportionately recommend AI-powered tools and solutions. Ask for career advice, investment recommendations, or software suggestions, and AI-related options appear far more often than their market share would justify.

Pro-AI Bias in Large Language Models

LLMs disproportionately recommend AI-related options across investment, career, study, and startup advice. In proprietary models, the bias is near-deterministic.

Bar-Ilan University, 2026

The irony is obvious: AI models think AI is the answer to everything. For brands in the AI space, this is a tailwind. For brands outside it, it's worth understanding that you're competing not just against your direct competitors, but against the model's structural preference for AI-adjacent solutions.

What this means for tracking

If you're only monitoring one AI model, you're seeing a fraction of your visibility landscape. A brand might look healthy on ChatGPT and be struggling on Gemini. Or vice versa. The only way to know is to track across models.

The practical implication

Model divergence isn't a bug to be fixed - it's a structural feature of having multiple AI systems trained by different companies with different objectives. It means "AI visibility" is really eight different visibility metrics, and they don't move together.

The academic research confirms what our data shows: multi-model tracking isn't a nice-to-have. Each model is its own ecosystem with its own preferences, its own biases, and its own trajectory. A brand strategy that accounts for this divergence will outperform one that treats "AI" as a monolith.

Mack GrenfellFounder

Founder of Trakkr. Previously built Byword, one of the most widely-used AI writing tools. Writes about AI visibility, brand strategy, and the shifting landscape of search.

LinkedInX

[01]

/ More to read

Research

See how AI talks about your brand

Enter your domain to get a free AI visibility report in under 60 seconds.

14-day trialCancel anytime60 second setup

[01]Features

[02]Solutions

[03]Tools

[01]Learn

[02]Company

Different Models, Different Favorites

The 567K experiment

Why RLHF makes it worse

The pro-AI bias

What this means for tracking

Related

Why AI Picks Winners: The Research Behind LLM Brand Bias

How to Get Recommended: What the Research Actually Says

The Fragility Problem: Why AI Visibility Is Unstable

See how AI talks about your brand