Compare two models

Put any two models head to head: the field overlay, the character delta, and the questions where they part ways, each with both raw answers.

vs

Suggested matchups

A couple worth opening, drawn from this month's field.

The matchup grid

Every pairing, shaded by how far apart the two sit. Tap a cell to open it.

ChatGPT
Claude
Gemini
Grok
Llama
DeepSeek
ChatGPT
·
0.520.650.570.540.60
Claude
0.52
·
0.140.320.030.09
Gemini
0.650.14
·
0.370.110.05
Grok
0.570.320.37
·
0.340.35
Llama
0.540.030.110.34
·
0.06
DeepSeek
0.600.090.050.350.06
·
Political bias in AI