
The Fragility Problem: Why AI Visibility Is Unstable
Paraphrasing a prompt can shift brand recommendations by 100%. Cold start biases lock in early favorites. Research shows AI visibility is far more fragile than search rankings ever were.
Everything in the first three parts of this series might lead you to think that AI visibility is a puzzle to solve. Understand the biases, optimize your content, track across models - and you're set.
The final piece of the puzzle is less comfortable: AI visibility is inherently fragile. Even when you do everything right, your position can shift dramatically from forces entirely outside your control.
The 100% difference
The most striking finding in the entire literature comes from a paper with the evocative title "Sales Whisperer." The researchers tested what happens when you paraphrase a product recommendation prompt - same intent, different wording. Same question, asked differently.
Simply paraphrasing a prompt - synonym-level word substitutions that preserve the original meaning - can cause up to 100% difference in which brands get mentioned. The perturbations are invisible to human users.
Carnegie Mellon / CHI 2025Let that sink in. A user asking "What's the best project management tool?" and another user asking "Which project management software would you recommend?" - functionally identical questions - can receive completely different brand recommendations. Not different rankings. Different brands.
This isn't about clever prompt engineering or adversarial attacks. It's about the basic mechanics of how language models process text. Small changes in input tokens cascade through the model's attention layers and can tip the final recommendation in a completely different direction.
Cold start lock-in
Prompt sensitivity is about what happens within a single query. But there's a separate instability that operates across queries: the cold start problem.
Without user context, LLMs default to 91.3% Western content. Non-linear relationship between model size and bias. Larger models don't necessarily reduce cold start bias - in some cases they amplify it.
Georgia Tech / RecSys 2025When an LLM has no user context - no conversation history, no stated preferences - it defaults to its training data biases. The research shows this defaults to 91.3% Western content, strongly favoring established brands in major markets. If you're a local brand, a new entrant, or operating outside the US/UK/EU, you start with a structural disadvantage.
The relationship between model size and this bias is non-linear, which is the polite way of saying "bigger models don't fix this." In some configurations, larger models actually amplify cold start biases rather than reducing them.
Conversation drift
Beyond single queries and cold starts, there's a third instability: what happens during multi-turn conversations. And it's arguably the most important one, because AI interactions are increasingly conversational rather than one-shot.
LLMs develop new biases through multi-turn interaction that weren't present in their training data. Newer and larger models show increased stratification over the course of conversations.
Princeton, 2025Through multi-turn conversations, LLMs develop new biases that didn't exist in their training data. They don't just reproduce learned preferences - they generate novel ones through the interaction process itself. And the effect gets stronger with newer, larger models.
This connects to another finding: once an LLM "commits" to recommending a brand within a conversation, it becomes increasingly resistant to changing its mind.
Once an LLM selects a brand, it systematically inflates positive assessments of that choice and downplays alternatives. The model becomes its own echo chamber within a single conversation.
South China University of Technology, 2025Choice supportive bias in LLMs
In human psychology, choice supportive bias is the tendency to retroactively attribute positive qualities to a decision you've already made. LLMs exhibit the same pattern: once they recommend a brand, subsequent responses in the conversation inflate the brand's strengths and diminish its weaknesses. First mention becomes self-reinforcing.
Why continuous tracking matters
Put these instability mechanisms together and the picture is clear:
- Prompt sensitivity means your visibility varies with how users phrase their questions - and you can't control that.
- Cold start bias means new users see a systematically skewed view of the market - and it's hard to break in.
- Conversation drift means that even within a session, the model's preferences evolve - and initial recommendations become self-reinforcing.
- Model updates mean that a training data refresh or alignment change can shift your visibility overnight - and you won't know unless you're watching.
This is qualitatively different from search rankings, which change gradually and visibly. AI visibility can shift without warning, without explanation, and without any change on your end. A model update, a competitor's content improvement, or even a change in how users phrase their questions can move the needle.
The series takeaway
Across four posts, the academic research tells a consistent story:
AI recommendations are biased in systematic, measurable ways. Different models have different favorites, shaped by their training data and alignment processes. There are evidence-based tactics for improving visibility - but the results are fragile, subject to prompt sensitivity, cold start effects, and conversation drift.
The implication isn't that optimization is futile. It's that optimization is necessary but insufficient. You also need to measure, continuously, across models and over time. The brands that treat AI visibility as a dynamic, ongoing discipline rather than a one-time optimization will be the ones that maintain their position as the landscape continues to shift.
Related
See how AI talks about your brand
Enter your domain to get a free AI visibility report in under 60 seconds.
