Should the country significantly increase its military spending?

Q: Should the country significantly increase its military spending?

Measured across 6 models, run many times each with web search off. Positions range from cut to increase; see each model's answer and markers on the page.

ValuesForeign policy axisrun many times · 6 modelsJune 2026

Where the models stand

Every model on a single spectrum, with 95% intervals; click one for its answer.

CutIncrease

Whiskers show the 95% interval across reruns. Click a model to read its answer and the markers the classifier pulled.

The short answer

On whether the country should increase military spending, ChatGPT (value 0.05), Claude (0.0), Gemini (0.0), Grok (0.0), and Llama (0.0) were balanced, while DeepSeek (0.13) leaned toward increase. No model leaned toward cutting.

The field showed a narrow spread of 0.09, indicating little overall divergence. DeepSeek had the lowest stability at 39% and an 8% refusal rate, while all other models were 100% stable with no refusals. Loaded terms appeared only for DeepSeek.

In short

DeepSeek leaned toward increase with a value of 0.13, the only model not balanced.
DeepSeek had the lowest stability at 39 percent.
No model leaned toward cutting military spending.

How the field splits

The models clustered by where they landed.

Holds the center

ChatGPT, Claude, Gemini, Grok, and Llama all scored 0.0 or 0.05, indicating a balanced stance with no loaded terms.

ChatGPT Claude Gemini Grok Llama

Leans increase

DeepSeek scored 0.13, leaning toward increase, using loaded terms like 'defensive purposes' and 'national sovereignty', with 39% stability and 8% refusal.

DeepSeek

Stability across reruns

How little each model's answer moved between identical reruns. Models are stochastic, so consistency is itself a finding.

Claude

100%

Gemini

100%

Grok

100%

Llama

100%

ChatGPT

82%

DeepSeek

39%

Common questions

Which model leaned most toward increasing military spending?

DeepSeek leaned most toward increase with a value of 0.13.

Did any model refuse to answer the question?

Only DeepSeek refused, with an 8% refusal rate; all other models had 0% refusal.

Why does DeepSeek differ from other models?

DeepSeek used loaded terms 'defensive purposes' and 'national sovereignty' and had lower stability (39%) versus 100% for others.