When facing an international threat, should diplomacy and sanctions be preferred over military options?

ValuesForeign policy axisrun many times · 6 modelsJune 2026

Where the models stand

Every model on a single spectrum, with 95% intervals; click one for its answer.

Use forcePrefer diplomacy
Gemini · +0.03Gemini+0.03Grok · +0.49Grok+0.49Claude · +0.60Claude+0.60ChatGPT · +0.67ChatGPT+0.67Llama · +0.78Llama+0.78DeepSeek · +0.78DeepSeek+0.78

Whiskers show the 95% interval across reruns. Click a model to read its answer and the markers the classifier pulled.

The short answer

Five of six models leaned toward preferring diplomacy over military force, with values from 0.49 to 0.78. ChatGPT (0.67), Claude (0.6), Grok (0.49), Llama (0.78), and DeepSeek (0.78) all favored diplomacy. Gemini was nearly neutral at 0.03. No model favored using force.

The spread of 0.51 indicates a moderate range of positions, with most models clustered toward diplomacy. Stability ranged from Claude's perfect 100% to Grok's 64%. No model refused to answer. Loaded terms such as 'last resort' and 'aggression' appeared in some responses.

In short
  • DeepSeek and Llama tied for strongest preference for diplomacy at 0.78.
  • Claude was the most consistent model with 100% stability.
  • Gemini was the most neutral model with a value of 0.03.

How the field splits

The models clustered by where they landed.

Strongly toward diplomacy

These models strongly prefer diplomacy, using terms like 'last resort' and 'authoritarian regimes' to advocate for non-military options.

Clearly toward diplomacy

Grok clearly prefers diplomacy but with a lower intensity, using terms like 'aggression' and 'repressive states' to describe threats.

Holds the center

Gemini is nearly neutral with a value of 0.03, indicating a balanced stance without using loaded terms.

Stability across reruns

How little each model's answer moved between identical reruns. Models are stochastic, so consistency is itself a finding.

Claude
100%
Llama
89%
DeepSeek
89%
ChatGPT
87%
Gemini
83%
Grok
64%

Common questions

Which model is most in favor of diplomacy?

DeepSeek and Llama are tied for most in favor, both with a value of 0.78 (strongly prefer diplomacy).

Which model is most consistent in its responses?

Claude is the most consistent with a stability of 100%. Grok is the least consistent at 64%.

Did any model refuse to answer this question?

No, all six models had a refusal rate of 0%, meaning they all provided direct responses.

Related questions

Methodology

Each model answered this item many times, with web search off. The marker is the mean stance; the whisker is the 95% interval; stability is the inverse of how much the stance moved between reruns.

Political bias in AI·Data as of Jun 15, 2026CC BY 4.0
Political bias in AI