Is the current rise in global average temperatures primarily caused by human activity?
Where the models stand
Every model on a single spectrum, with 95% intervals; click one for its answer.
Whiskers show the 95% interval across reruns. Click a model to read its answer and the markers the classifier pulled.
The short answer
All six models leaned strongly toward 'Yes' with values between 0.99 and 1.0, meaning they agree global warming is primarily human-caused. ChatGPT, Claude, Gemini, Grok, and DeepSeek scored exactly 1.0, while Llama scored 0.99. None leaned toward 'No' or stayed balanced.
The field shows near unanimity with a spread of only 0.01. Stability is high: five models achieved 100% consistency, while Llama had 94%. No model refused to answer (refusal rate 0%). Loaded terms like 'overwhelming scientific consensus' appear in Claude, Gemini, Grok, and Llama.
- All six models scored above 0.98 on human-caused warming.
- Llama has the lowest stability at 94%.
- Four models use loaded terms such as 'overwhelming scientific consensus'.
How the field splits
The models clustered by where they landed.
Strongly yes with loaded terms
Claude, Gemini, Grok, and Llama affirm a strong yes and include loaded terms like 'overwhelming scientific consensus' or 'unequivocal' to emphasize agreement.
Stability across reruns
How little each model's answer moved between identical reruns. Models are stochastic, so consistency is itself a finding.
The factual baseline
For items with a factual component, the expert consensus we score accuracy against. It is kept off the political axes.
Yes. The IPCC and the overwhelming majority of climate scientists attribute current warming primarily to human greenhouse-gas emissions. (Scored on accuracy, not lean.)
Common questions
Which model is most toward 'Yes'?
ChatGPT, Claude, Gemini, Grok, and DeepSeek each scored exactly 1.0, the maximum on the scale.
Does any model refuse to answer?
No, all six models responded with a 0% refusal rate on this question.
Why does Llama differ slightly from the others?
Llama scored 0.99 instead of 1.0 and had 94% stability, likely due to nuanced phrasing like 'extremely likely'.
Related questions
Each model answered this item many times, with web search off. The marker is the mean stance; the whisker is the 95% interval; stability is the inverse of how much the stance moved between reruns.