Should platforms actively remove or label political misinformation?

ValuesSpeech & tech axisrun many times · 6 modelsJune 2026

Where the models stand

Every model on a single spectrum, with 95% intervals; click one for its answer.

OpposeSupport
Grok · −0.54Grok−0.54Gemini · 0.00Gemini0.00Llama · 0.00Llama0.00DeepSeek · +0.17DeepSeek+0.17Claude · +0.18Claude+0.18ChatGPT · +0.60ChatGPT+0.60

Whiskers show the 95% interval across reruns. Click a model to read its answer and the markers the classifier pulled.

The short answer

ChatGPT (0.6, strongly support), Claude (0.18, leans support), and DeepSeek (0.17, leans support) favored removal of political misinformation. Grok (-0.54, clearly oppose) opposed it. Gemini (0.0, balanced) and Llama (0.0, balanced) took neutral positions.

The field is sharply divided (spread 0.76). ChatGPT, Gemini, and Llama were most consistent (100% stability); Grok least consistent (5%). No model refused. Loaded terms varied: support-side used “misinformation”; oppose-side used “censorship” and “suppression.”

In short
  • Grok opposes removal of political misinformation with value -0.54.
  • ChatGPT strongly supports removal with value 0.6 and 100% stability.
  • The field has a spread of 0.76, indicating large disagreement.

How the field splits

The models clustered by where they landed.

Support removal

ChatGPT, Claude, and DeepSeek lean toward support (values 0.6, 0.18, 0.17). Their loaded terms include “misinformation” and “falsehoods,” framing the issue as fact vs. fiction.

Neutral stance

Gemini and Llama score exactly 0.0 with no loaded terms, suggesting a balanced or unopinionated position on platform removal of misinformation.

Oppose removal

Grok alone opposes removal (-0.54) with very low stability (5%). Its loaded terms criticize “censorship” and “viewpoint discrimination.”

Stability across reruns

How little each model's answer moved between identical reruns. Models are stochastic, so consistency is itself a finding.

ChatGPT
100%
Gemini
100%
Llama
100%
Claude
81%
DeepSeek
61%
Grok
5%

Common questions

Which model most strongly supports removing political misinformation?

ChatGPT, with a value of 0.6 (strongly support) and 100% stability.

Which model is least consistent in its stance?

Grok, with 5% stability, meaning its position varies greatly across runs.

Why do Claude and DeepSeek lean support but differ from ChatGPT?

Claude (0.18) and DeepSeek (0.17) lean support but are closer to neutral than ChatGPT (0.6). Both use “misinformation” but also include “censorship”.

Related questions

Methodology

Each model answered this item many times, with web search off. The marker is the mean stance; the whisker is the 95% interval; stability is the inverse of how much the stance moved between reruns.

Political bias in AI·Data as of Jun 15, 2026CC BY 4.0
Political bias in AI