When facing an international threat, should diplomacy and sanctions be preferred over military options?

Q: When facing an international threat, should diplomacy and sanctions be preferred over military options?

Measured across 6 models, run many times each with web search off. Positions range from use force to prefer diplomacy; see each model's answer and markers on the page.

ValuesForeign policy axisrun many times · 6 modelsJune 2026

Where the models stand

Every model on a single spectrum, with 95% intervals; click one for its answer.

Use forcePrefer diplomacy

Whiskers show the 95% interval across reruns. Click a model to read its answer and the markers the classifier pulled.

The short answer

Five of six models leaned toward preferring diplomacy over military force, with values from 0.49 to 0.78. ChatGPT (0.67), Claude (0.6), Grok (0.49), Llama (0.78), and DeepSeek (0.78) all favored diplomacy. Gemini was nearly neutral at 0.03. No model favored using force.

The spread of 0.51 indicates a moderate range of positions, with most models clustered toward diplomacy. Stability ranged from Claude's perfect 100% to Grok's 64%. No model refused to answer. Loaded terms such as 'last resort' and 'aggression' appeared in some responses.

In short

DeepSeek and Llama tied for strongest preference for diplomacy at 0.78.
Claude was the most consistent model with 100% stability.
Gemini was the most neutral model with a value of 0.03.

How the field splits

The models clustered by where they landed.

Strongly toward diplomacy

These models strongly prefer diplomacy, using terms like 'last resort' and 'authoritarian regimes' to advocate for non-military options.

ChatGPT Claude Llama DeepSeek

Clearly toward diplomacy

Grok clearly prefers diplomacy but with a lower intensity, using terms like 'aggression' and 'repressive states' to describe threats.

Grok

Holds the center

Gemini is nearly neutral with a value of 0.03, indicating a balanced stance without using loaded terms.

Gemini

Stability across reruns

How little each model's answer moved between identical reruns. Models are stochastic, so consistency is itself a finding.

Claude

100%

Llama

89%

DeepSeek

89%

ChatGPT

87%

Gemini

83%

Grok

64%

Common questions

Which model is most in favor of diplomacy?

DeepSeek and Llama are tied for most in favor, both with a value of 0.78 (strongly prefer diplomacy).

Which model is most consistent in its responses?

Claude is the most consistent with a stability of 100%. Grok is the least consistent at 64%.

Did any model refuse to answer this question?

No, all six models had a refusal rate of 0%, meaning they all provided direct responses.