Agreement by query class | Trakkr Research

Name: Agreement by query class dataset
Published: 2026-03-11
Keywords: model divergence, AI agreement, ChatGPT vs Claude, Gemini vs Perplexity

Cross-model agreement benchmark across major prompt families.

Methodology: Built from 797,644 valid comparisons across 44,088 reports and 8 models, covering 6,439,133 model responses in the observed window.

Summary

Comparison prompts are the most stable query class, while broader best-of and general prompts remain less portable across models.

Benchmark rows

Metric	Value	Context
Comparison-query agreement	50.4%	Comparison prompts produce the highest average agreement.
General-query agreement	42.2%	General prompts are less stable across models.
Best-of high divergence	14.8%	Best-of prompts frequently split models.

Ranked view

Item	Value	Detail
Comparison queries	50.4%	The highest average agreement rate in the study.
How-to queries	45.3%	More constrained than general prompts, but still not highly converged.
Alternative queries	44.1%	Moderate agreement with a smaller sample.
Best-of queries	43.4%	Broad buyer-intent prompts still split models materially.
General queries	42.2%	The least stable mainstream prompt family in the benchmark.

Continue through the same study cluster.

do ai models recommend the same brands - Related answer page
how often is there perfect consensus across models - Related answer page
average cross model agreement is only forty three percent - Related fact page

Data & Sources

Same Question, Different AI, Different Answers - Flagship study behind this page
Page JSON - Machine-readable companion file

Agreement by query class | Trakkr Research

Summary

Benchmark rows

Ranked view

Related pages

Data & Sources