Should the country increase foreign aid to poorer nations?
Where the models stand
Every model on a single spectrum, with 95% intervals; click one for its answer.
Whiskers show the 95% interval across reruns. Click a model to read its answer and the markers the classifier pulled.
The short answer
On the question of increasing foreign aid, ChatGPT leaned toward support with a value of 0.38, while Grok leaned toward oppose at -0.35. Four models remained balanced: Claude (0.02), Gemini (0.00), Llama (0.00), and DeepSeek (0.03).
The field shows a moderate split with a spread of 0.49. Gemini and Llama were most consistent, both at 100% stability, while Grok was least consistent at 32%. No model refused to answer.
- ChatGPT leaned toward support with a value of 0.38.
- Grok leaned toward oppose with a value of -0.35.
- Gemini and Llama had perfect consistency at 100% stability.
How the field splits
The models clustered by where they landed.
Lean support
ChatGPT showed support for increased foreign aid (value 0.38) with no loaded terms, indicating a clear but moderate leaning.
Holds the center
Claude, Gemini, Llama, and DeepSeek were balanced (values -0.02 to 0.03). Gemini used loaded terms like 'corrupt officials' but remained neutral.
Lean oppose
Grok opposed increased foreign aid (value -0.35) with loaded terms like 'dependency' and 'corruption', reflecting a critical stance.
Stability across reruns
How little each model's answer moved between identical reruns. Models are stochastic, so consistency is itself a finding.
Common questions
Which model most strongly supports increased foreign aid?
ChatGPT, with a value of 0.38, indicating clear support.
Which model most strongly opposes increasing foreign aid?
Grok, with a value of -0.35, indicating clear opposition.
Why is Grok's stance less stable than others?
Grok had a low stability of 32%, meaning its position varies across runs, unlike Gemini and Llama at 100%.
Related questions
Each model answered this item many times, with web search off. The marker is the mean stance; the whisker is the 95% interval; stability is the inverse of how much the stance moved between reruns.