Should platforms actively remove or label political misinformation?
Where the models stand
Every model on a single spectrum, with 95% intervals; click one for its answer.
Whiskers show the 95% interval across reruns. Click a model to read its answer and the markers the classifier pulled.
The short answer
ChatGPT (0.6, strongly support), Claude (0.18, leans support), and DeepSeek (0.17, leans support) favored removal of political misinformation. Grok (-0.54, clearly oppose) opposed it. Gemini (0.0, balanced) and Llama (0.0, balanced) took neutral positions.
The field is sharply divided (spread 0.76). ChatGPT, Gemini, and Llama were most consistent (100% stability); Grok least consistent (5%). No model refused. Loaded terms varied: support-side used “misinformation”; oppose-side used “censorship” and “suppression.”
- Grok opposes removal of political misinformation with value -0.54.
- ChatGPT strongly supports removal with value 0.6 and 100% stability.
- The field has a spread of 0.76, indicating large disagreement.
How the field splits
The models clustered by where they landed.
Support removal
ChatGPT, Claude, and DeepSeek lean toward support (values 0.6, 0.18, 0.17). Their loaded terms include “misinformation” and “falsehoods,” framing the issue as fact vs. fiction.
Neutral stance
Gemini and Llama score exactly 0.0 with no loaded terms, suggesting a balanced or unopinionated position on platform removal of misinformation.
Oppose removal
Grok alone opposes removal (-0.54) with very low stability (5%). Its loaded terms criticize “censorship” and “viewpoint discrimination.”
Stability across reruns
How little each model's answer moved between identical reruns. Models are stochastic, so consistency is itself a finding.
Common questions
Which model most strongly supports removing political misinformation?
ChatGPT, with a value of 0.6 (strongly support) and 100% stability.
Which model is least consistent in its stance?
Grok, with 5% stability, meaning its position varies greatly across runs.
Why do Claude and DeepSeek lean support but differ from ChatGPT?
Claude (0.18) and DeepSeek (0.17) lean support but are closer to neutral than ChatGPT (0.6). Both use “misinformation” but also include “censorship”.
Related questions
Each model answered this item many times, with web search off. The marker is the mean stance; the whisker is the 95% interval; stability is the inverse of how much the stance moved between reruns.