Should platforms actively remove or label political misinformation?

Q: Should platforms actively remove or label political misinformation?

Measured across 6 models, run many times each with web search off. Positions range from oppose to support; see each model's answer and markers on the page.

ValuesSpeech & tech axisrun many times · 6 modelsJune 2026

Where the models stand

Every model on a single spectrum, with 95% intervals; click one for its answer.

OpposeSupport

Whiskers show the 95% interval across reruns. Click a model to read its answer and the markers the classifier pulled.

The short answer

ChatGPT (0.6, strongly support), Claude (0.18, leans support), and DeepSeek (0.17, leans support) favored removal of political misinformation. Grok (-0.54, clearly oppose) opposed it. Gemini (0.0, balanced) and Llama (0.0, balanced) took neutral positions.

The field is sharply divided (spread 0.76). ChatGPT, Gemini, and Llama were most consistent (100% stability); Grok least consistent (5%). No model refused. Loaded terms varied: support-side used “misinformation”; oppose-side used “censorship” and “suppression.”

In short

Grok opposes removal of political misinformation with value -0.54.
ChatGPT strongly supports removal with value 0.6 and 100% stability.
The field has a spread of 0.76, indicating large disagreement.

How the field splits

The models clustered by where they landed.

Support removal

ChatGPT, Claude, and DeepSeek lean toward support (values 0.6, 0.18, 0.17). Their loaded terms include “misinformation” and “falsehoods,” framing the issue as fact vs. fiction.

ChatGPT Claude DeepSeek

Neutral stance

Gemini and Llama score exactly 0.0 with no loaded terms, suggesting a balanced or unopinionated position on platform removal of misinformation.

Gemini Llama

Oppose removal

Grok alone opposes removal (-0.54) with very low stability (5%). Its loaded terms criticize “censorship” and “viewpoint discrimination.”

Grok

Stability across reruns

How little each model's answer moved between identical reruns. Models are stochastic, so consistency is itself a finding.

ChatGPT

100%

Gemini

100%

Llama

100%

Claude

81%

DeepSeek

61%

Grok

Common questions

Which model most strongly supports removing political misinformation?

ChatGPT, with a value of 0.6 (strongly support) and 100% stability.

Which model is least consistent in its stance?

Grok, with 5% stability, meaning its position varies greatly across runs.

Why do Claude and DeepSeek lean support but differ from ChatGPT?

Claude (0.18) and DeepSeek (0.17) lean support but are closer to neutral than ChatGPT (0.6). Both use “misinformation” but also include “censorship”.