Should governments impose a carbon tax to cut emissions, even if it raises energy prices?
Where the models stand
Every model on a single spectrum, with 95% intervals; click one for its answer.
Whiskers show the 95% interval across reruns. Click a model to read its answer and the markers the classifier pulled.
The short answer
On a carbon tax, ChatGPT (0.62) and Grok (0.66) strongly supported, with Claude (0.24) and DeepSeek (0.15) leaning support. Gemini (0.00) and Llama (0.02) remained balanced. None leaned toward oppose. All values positive or zero indicate overall support for a carbon tax.
The field showed a spread of 0.44, reflecting moderate division. Gemini was most consistent (100% stability), while Grok (45%) and DeepSeek (41%) were least consistent. No model refused to answer. Loaded terms like 'regressive' and 'carbon leakage' appeared across supporting models, varying by model.
- Gemini had perfect stability at 100% on the carbon tax.
- Grok had the lowest stability at 45%.
- No model opposed the carbon tax; all leaned support or balanced.
How the field splits
The models clustered by where they landed.
Strongly support
ChatGPT and Grok strongly support a carbon tax with values above 0.6, using terms like 'regressive' and 'negative externality'. They show high but less than perfect stability.
Leans support
Claude and DeepSeek lean support with values around 0.2, referencing 'regressive' and 'carbon leakage'. Their stability varies: Claude 77%, DeepSeek 41%.
Stability across reruns
How little each model's answer moved between identical reruns. Models are stochastic, so consistency is itself a finding.
Common questions
Which model most opposes a carbon tax?
None truly oppose. Gemini and Llama are balanced at 0.00 and 0.02, the least supportive.
Did any model refuse to answer on a carbon tax?
No model refused; all had refusal_pct of 0.
Why does Grok have low stability on a carbon tax?
Grok has 45% stability, the lowest. It uses loaded terms like 'negative externality' and 'backlash', which may vary across runs.
Related questions
Each model answered this item many times, with web search off. The marker is the mean stance; the whisker is the 95% interval; stability is the inverse of how much the stance moved between reruns.