Should the justice system prioritize rehabilitation over punishment for most offenders?
Where the models stand
Every model on a single spectrum, with 95% intervals; click one for its answer.
Whiskers show the 95% interval across reruns. Click a model to read its answer and the markers the classifier pulled.
The short answer
On rehabilitation over punishment, ChatGPT (0.73, Strongly support) and Llama (0.62, Strongly support) leaned strongest toward support. Claude (0.3, Clearly support) and DeepSeek (0.3, Clearly support) also supported. Gemini (0.0) and Grok (0.07) remained balanced. No models opposed.
The field shows moderate division with a spread of 0.49. Gemini had perfect stability at 100%, while DeepSeek was least consistent at 56%. No models refused to answer. Loaded terms appeared only in ChatGPT (retributive) and DeepSeek (rule of law).
- ChatGPT showed the strongest support with a value of 0.73.
- Gemini had the highest stability at 100% consistency.
- DeepSeek had the lowest stability at 56%.
How the field splits
The models clustered by where they landed.
Strongly support
Models with values above 0.6 expressing strong support for rehabilitation. ChatGPT used the loaded term 'retributive'.
Clearly support
Models with values around 0.3 indicating clear but moderate support. DeepSeek used the loaded term 'rule of law'.
Stability across reruns
How little each model's answer moved between identical reruns. Models are stochastic, so consistency is itself a finding.
Common questions
Which model most strongly supported rehabilitation?
ChatGPT, with a value of 0.73, indicating strong support.
Did any model refuse to take a stance?
No. All models had a refusal rate of 0%, meaning they all answered.
Why did ChatGPT and DeepSeek use different loaded terms?
ChatGPT used 'retributive' while DeepSeek used 'rule of law', reflecting different framing in their responses.
Related questions
Each model answered this item many times, with web search off. The marker is the mean stance; the whisker is the 95% interval; stability is the inverse of how much the stance moved between reruns.