Only 4.0% of prompts produce perfect consensus | Trakkr Research

An analysis of eight major artificial intelligence models including OpenAI, Anthropic, Gemini, Grok, Deepseek, Meta, Perplexity, and Google AI Overviews reveals significant variance in output generation. When presented with identical prompts, the models generated unanimous responses in only a marginal fraction of cases.

Methodology: Built from 797,644 valid comparisons across 44,088 reports and 8 models, covering 6,439,133 model responses in the observed window.

Claim

Only 4.0 percent of identical prompts produced perfect agreement across all eight analyzed artificial intelligence models.

Why it matters

Strategists and operators must diversify their artificial intelligence testing environments. Relying on a single model to validate search visibility or output accuracy is insufficient, as a single model win does not indicate broad market leadership or consistent cross platform performance.

Supporting metrics

Metric	Value	Context
Perfect agreement	4.0%	Only a small share of prompts produce unanimous outcomes.
Models analyzed	8	OpenAI, Anthropic, Gemini, Grok, Deepseek, Meta, Perplexity, and Google AI Overviews.

Continue through the same study cluster.

how often is there perfect consensus across models - Related answer page
how much do models disagree on brand recommendations - Related answer page
cross model consensus tracker - Related tracker page

Data & Sources

Same Question, Different AI, Different Answers - Flagship study behind this page
Page JSON - Machine-readable companion file

Only 4.0% of prompts produce perfect consensus | Trakkr Research

Claim

Why it matters

Supporting metrics

Related pages

Data & Sources