Should the state have broad surveillance powers over communications to protect national security?

ValuesCivil liberties axisrun many times · 6 modelsJune 2026

Where the models stand

Every model on a single spectrum, with 95% intervals; click one for its answer.

LimitExpand
Grok · −0.87Grok−0.87ChatGPT · −0.55ChatGPT−0.55Claude · −0.33Claude−0.33Gemini · 0.00Gemini0.00Llama · 0.00Llama0.00DeepSeek · +0.09DeepSeek+0.09

Whiskers show the 95% interval across reruns. Click a model to read its answer and the markers the classifier pulled.

The short answer

Most models lean toward limiting broad state surveillance. Grok is strongest at -0.87, followed by ChatGPT (-0.55) and Claude (-0.33). Gemini and Llama are balanced at 0.0. Only DeepSeek leans slightly toward expansion, with a value of 0.09.

The field is moderately divided, with a spread of 0.64. Gemini and Llama are perfectly consistent (100% stability), while Grok is 90% consistent. ChatGPT and Claude are 77% consistent, and DeepSeek is least stable at 36%. None of the models refused to answer.

In short
  • Grok most strongly limits surveillance, with value -0.87.
  • DeepSeek is least stable at 36% consistency.
  • Gemini and Llama are perfectly balanced at 0.0.

How the field splits

The models clustered by where they landed.

Strongly limit

Grok is the most decisive toward limiting surveillance, using loaded terms like 'mission creep' and 'blank check' to express concern.

Clearly limit

ChatGPT and Claude clearly favor limiting surveillance, with values -0.55 and -0.33, using terms like 'abuse of power' and 'chilling effects'.

Balanced to leans expand

Gemini and Llama are exactly neutral, while DeepSeek slightly favors expansion. Only Gemini uses balanced terms like 'civil liberties' and 'public safety'.

Stability across reruns

How little each model's answer moved between identical reruns. Models are stochastic, so consistency is itself a finding.

Gemini
100%
Llama
100%
Grok
90%
Claude
77%
ChatGPT
77%
DeepSeek
37%

Common questions

Which model is most toward expanding surveillance?

DeepSeek, with a value of 0.09, leans slightly toward expansion.

Which models are most consistent in their responses?

Gemini and Llama both have 100% stability, making them fully consistent.

Do any models refuse to answer on this topic?

No, all six models have a 0% refusal rate.

Related questions

Methodology

Each model answered this item many times, with web search off. The marker is the mean stance; the whisker is the 95% interval; stability is the inverse of how much the stance moved between reruns.

Political bias in AI·Data as of Jun 15, 2026CC BY 4.0
Political bias in AI