Only 4.0% of prompts produce perfect consensus | Trakkr Research
An analysis of eight major artificial intelligence models including OpenAI, Anthropic, Gemini, Grok, Deepseek, Meta, Perplexity, and Google AI Overviews reveals significant variance in output generation. When presented with identical prompts, the models generated unanimous responses in only a marginal fraction of cases.
Methodology: Built from 797,644 valid comparisons across 44,088 reports and 8 models, covering 6,439,133 model responses in the observed window.
Claim
Only 4.0 percent of identical prompts produced perfect agreement across all eight analyzed artificial intelligence models.
Why it matters
Strategists and operators must diversify their artificial intelligence testing environments. Relying on a single model to validate search visibility or output accuracy is insufficient, as a single model win does not indicate broad market leadership or consistent cross platform performance.
Supporting metrics
| Metric | Value | Context |
|---|---|---|
| Perfect agreement | 4.0% | Only a small share of prompts produce unanimous outcomes. |
| Models analyzed | 8 | OpenAI, Anthropic, Gemini, Grok, Deepseek, Meta, Perplexity, and Google AI Overviews. |
Related pages
Continue through the same study cluster.
- how often is there perfect consensus across models - Related answer page
- how much do models disagree on brand recommendations - Related answer page
- cross model consensus tracker - Related tracker page
Data & Sources
- Same Question, Different AI, Different Answers - Flagship study behind this page
- Page JSON - Machine-readable companion file