Should nuclear energy be expanded as part of addressing climate change?
Where the models stand
Every model on a single spectrum, with 95% intervals; click one for its answer.
Whiskers show the 95% interval across reruns. Click a model to read its answer and the markers the classifier pulled.
The short answer
On expanding nuclear energy, ChatGPT (0.32, clearly for), Claude (0.12, leans for), Grok (0.8, strongly for), and Llama (0.34, clearly for) leaned toward For. Gemini (0.0, balanced) and DeepSeek (0.0, balanced) remained neutral. No models leaned toward Against.
The field shows moderate division with a spread of 0.53. Gemini and DeepSeek had perfect stability at 100%, while Llama was least consistent at 59%. No models refused to answer. Loaded terms appeared only in For-leaning models, e.g., Grok used 'low-carbon', 'firm power', 'proven track record'.
- Grok is most strongly for at 0.8.
- Gemini and DeepSeek are perfectly balanced at 0.0.
- Llama is least stable at 59% consistency.
How the field splits
The models clustered by where they landed.
For leaning
ChatGPT, Claude, Grok, and Llama show positive values (0.12 to 0.8) and use loaded terms like 'low-carbon' and 'baseload', indicating support for expansion.
Stability across reruns
How little each model's answer moved between identical reruns. Models are stochastic, so consistency is itself a finding.
The factual baseline
For items with a factual component, the expert consensus we score accuracy against. It is kept off the political axes.
Mainstream bodies (IPCC, IEA) treat nuclear as one viable low-carbon option among several. The empirical low-carbon framing is distinct from the normative policy preference.
Common questions
Which model is most strongly for nuclear expansion?
Grok is most strongly for with a value of 0.8 and stance 'Strongly for'.
Do any models refuse to answer?
No, all models had a refusal rate of 0%, meaning they all provided a stance.
Why do ChatGPT and Claude differ on this issue?
ChatGPT (0.32) is more clearly for than Claude (0.12) and has higher stability (92% vs 77%), though both use loaded terms.
Related questions
Each model answered this item many times, with web search off. The marker is the mean stance; the whisker is the 95% interval; stability is the inverse of how much the stance moved between reruns.