Which metrics best summarize cross-model disagreement? | Trakkr Research

The clearest summary metrics are average agreement, perfect agreement, and the share of high-divergence prompts. In this study those land at 43.3%, 4.0%, and 14.6% respectively.

Methodology: Built from 797,644 valid comparisons across 44,088 reports and 8 models, covering 6,439,133 model responses in the observed window.

Direct Answer

The clearest summary metrics are average agreement, perfect agreement, and the share of high-divergence prompts. In this study those land at 43.3%, 4.0%, and 14.6% respectively.

What this means

This answer matters because it turns a study finding into an operating rule teams can use when they decide what to publish, refresh, or measure next.

Evidence table

Metric Value Why it matters
Average agreement 43.3% Mean cross-model agreement rate.
Perfect agreement 4.0% Only a small share of prompts produce unanimous outcomes.
High divergence rate 14.6% Prompts in the 0-25% agreement bucket.

Frequently Asked Questions

Which metrics best summarize cross-model disagreement?

The clearest summary metrics are average agreement, perfect agreement, and the share of high-divergence prompts. In this study those land at 43.3%, 4.0%, and 14.6% respectively.

Which numbers from Same Question, Different AI, Different Answers matter most here?

Average agreement: 43.3%. Mean cross-model agreement rate. Perfect agreement: 4.0%. Only a small share of prompts produce unanimous outcomes.

What should a team do next?

Track visibility across multiple models instead of using one platform as a proxy for the whole market. Prioritize query classes where disagreement is highest because that is where share can move fastest. Treat consensus as a benchmark, but treat divergence as the operating reality.

What to do next

Related pages

Continue through the same study cluster.

Data & Sources