
Different Models, Different Favorites
ChatGPT, Gemini, and Claude don't recommend the same brands. Academic studies across 567K+ samples show systematic model-level preferences - and alignment training makes it worse.
Here's a question that doesn't get asked enough: when you optimize your brand for "AI visibility," which AI are you optimizing for?
The implicit assumption in most discussions about AI search is that the models are roughly interchangeable. That if ChatGPT recommends you, Gemini probably does too. That "AI visibility" is one thing, not eight different things.
The academic evidence says otherwise. And the numbers aren't even close.
The 567K experiment
In early 2025, a team of researchers ran one of the largest studies on LLM product preferences to date. They generated 567,000 product recommendation samples across five major language models, all answering the same investment-related product questions.
Different LLMs recommend distinct products with remarkably low overlap. 567K samples across models confirm systematic, model-specific preferences that persist across prompt variations.
Xi'an Jiaotong University, 2025The headline finding: different models recommend different products. Not slightly different - substantially different, with low overlap between their preferred choices. Each model has its own set of "favorites" that it returns to consistently, and these favorites differ meaningfully from model to model.
Put differently: a brand that's highly visible on ChatGPT might be nearly invisible on Claude, and vice versa. Not because of anything the brand did differently, but because of how each model's training and alignment shaped its preferences.
We've seen the same pattern in our own data. Across 920K model comparisons, the average agreement rate is 43.9%. Full consensus - all models recommending the same brand - happens only 4% of the time. That's not noise. That's a structural feature of the landscape.
Why RLHF makes it worse
To understand why models diverge, you need to understand RLHF - reinforcement learning from human feedback. It's the process that makes models "helpful" and "safe." But it also narrows their recommendation pool in ways that have measurable consequences for brand visibility.
RLHF/DPO alignment reduces output diversity, causing models to overweight majority preferences. This narrows the pool of recommended brands and reinforces incumbents.
MIT CSAIL / ICLR 2025The mechanism is intuitive once you see it. During RLHF, human raters label which responses are "better." These raters tend to prefer responses that mention well-known brands - because those answers feel more authoritative and trustworthy. The model learns this preference and amplifies it. Over time, dominant brands become more dominant, and the long tail of niche alternatives gets compressed.
The crucial insight: each model company uses different RLHF datasets, different rater pools, and different alignment objectives. OpenAI's alignment process produces different brand preferences than Anthropic's, which produces different preferences than Google's. The alignment step is where model-specific biases get baked in.
The pro-AI bias
There's one bias that cuts across all models: they disproportionately recommend AI-powered tools and solutions. Ask for career advice, investment recommendations, or software suggestions, and AI-related options appear far more often than their market share would justify.
LLMs disproportionately recommend AI-related options across investment, career, study, and startup advice. In proprietary models, the bias is near-deterministic.
Bar-Ilan University, 2026The irony is obvious: AI models think AI is the answer to everything. For brands in the AI space, this is a tailwind. For brands outside it, it's worth understanding that you're competing not just against your direct competitors, but against the model's structural preference for AI-adjacent solutions.
What this means for tracking
If you're only monitoring one AI model, you're seeing a fraction of your visibility landscape. A brand might look healthy on ChatGPT and be struggling on Gemini. Or vice versa. The only way to know is to track across models.
The practical implication
Model divergence isn't a bug to be fixed - it's a structural feature of having multiple AI systems trained by different companies with different objectives. It means "AI visibility" is really eight different visibility metrics, and they don't move together.
The academic research confirms what our data shows: multi-model tracking isn't a nice-to-have. Each model is its own ecosystem with its own preferences, its own biases, and its own trajectory. A brand strategy that accounts for this divergence will outperform one that treats "AI" as a monolith.
Related
See how AI talks about your brand
Enter your domain to get a free AI visibility report in under 60 seconds.
