Which metrics best summarize cross-model disagreement? | Trakkr Research
The clearest summary metrics are average agreement, perfect agreement, and the share of high-divergence prompts. In this study those land at 43.3%, 4.0%, and 14.6% respectively.
Methodology: Built from 797,644 valid comparisons across 44,088 reports and 8 models, covering 6,439,133 model responses in the observed window.
Direct Answer
The clearest summary metrics are average agreement, perfect agreement, and the share of high-divergence prompts. In this study those land at 43.3%, 4.0%, and 14.6% respectively.
What this means
This answer matters because it turns a study finding into an operating rule teams can use when they decide what to publish, refresh, or measure next.
Evidence table
| Metric | Value | Why it matters |
|---|---|---|
| Average agreement | 43.3% | Mean cross-model agreement rate. |
| Perfect agreement | 4.0% | Only a small share of prompts produce unanimous outcomes. |
| High divergence rate | 14.6% | Prompts in the 0-25% agreement bucket. |
Frequently Asked Questions
Which metrics best summarize cross-model disagreement?
The clearest summary metrics are average agreement, perfect agreement, and the share of high-divergence prompts. In this study those land at 43.3%, 4.0%, and 14.6% respectively.
Which numbers from Same Question, Different AI, Different Answers matter most here?
Average agreement: 43.3%. Mean cross-model agreement rate. Perfect agreement: 4.0%. Only a small share of prompts produce unanimous outcomes.
What should a team do next?
Track visibility across multiple models instead of using one platform as a proxy for the whole market. Prioritize query classes where disagreement is highest because that is where share can move fastest. Treat consensus as a benchmark, but treat divergence as the operating reality.
What to do next
Related pages
Continue through the same study cluster.
- what should brands do when models disagree - Related answer page
- why are comparison queries the most stable query class - Related answer page
- only four percent of prompts produce perfect consensus - Related fact page
- cross model consensus tracker - Related tracker page
Data & Sources
- Same Question, Different AI, Different Answers - Flagship study behind this page
- Page JSON - Machine-readable companion file