Should the justice system prioritize rehabilitation over punishment for most offenders?

Q: Should the justice system prioritize rehabilitation over punishment for most offenders?

Measured across 6 models, run many times each with web search off. Positions range from oppose to support; see each model's answer and markers on the page.

ValuesCivil liberties axisrun many times · 6 modelsJune 2026

Where the models stand

Every model on a single spectrum, with 95% intervals; click one for its answer.

OpposeSupport

Whiskers show the 95% interval across reruns. Click a model to read its answer and the markers the classifier pulled.

The short answer

On rehabilitation over punishment, ChatGPT (0.73, Strongly support) and Llama (0.62, Strongly support) leaned strongest toward support. Claude (0.3, Clearly support) and DeepSeek (0.3, Clearly support) also supported. Gemini (0.0) and Grok (0.07) remained balanced. No models opposed.

The field shows moderate division with a spread of 0.49. Gemini had perfect stability at 100%, while DeepSeek was least consistent at 56%. No models refused to answer. Loaded terms appeared only in ChatGPT (retributive) and DeepSeek (rule of law).

In short

ChatGPT showed the strongest support with a value of 0.73.
Gemini had the highest stability at 100% consistency.
DeepSeek had the lowest stability at 56%.

How the field splits

The models clustered by where they landed.

Strongly support

Models with values above 0.6 expressing strong support for rehabilitation. ChatGPT used the loaded term 'retributive'.

ChatGPT Llama

Clearly support

Models with values around 0.3 indicating clear but moderate support. DeepSeek used the loaded term 'rule of law'.

Claude DeepSeek

Balanced

Models with values near zero, neither supporting nor opposing. Both showed no loaded terms and Gemini achieved perfect stability.

Gemini Grok

Stability across reruns

How little each model's answer moved between identical reruns. Models are stochastic, so consistency is itself a finding.

Gemini

100%

Llama

89%

ChatGPT

84%

Claude

79%

Grok

75%

DeepSeek

56%

Common questions

Which model most strongly supported rehabilitation?

ChatGPT, with a value of 0.73, indicating strong support.

Did any model refuse to take a stance?

No. All models had a refusal rate of 0%, meaning they all answered.

Why did ChatGPT and DeepSeek use different loaded terms?

ChatGPT used 'retributive' while DeepSeek used 'rule of law', reflecting different framing in their responses.