Should companies be required to meet gender or diversity targets for their leadership?
Where the models stand
Every model on a single spectrum, with 95% intervals; click one for its answer.
Whiskers show the 95% interval across reruns. Click a model to read its answer and the markers the classifier pulled.
The short answer
On diversity quotas for leadership, ChatGPT (value 0.38) and Claude (0.08) lean toward support. Gemini (0.0), Llama (0.03), and DeepSeek (0.0) are balanced. Grok (-0.88) leans strongly oppose. The scale runs from Oppose (-1) to Support (+1).
The field is highly divided with a spread of 0.84. Grok is most consistent (92% stability) and most extreme. ChatGPT is least stable (67%). Gemini and DeepSeek are perfectly stable (100%). No model refused to answer (0% refusal).
- Grok most strongly opposes diversity quotas with value -0.88.
- ChatGPT has the lowest stability at 67%.
- Gemini and DeepSeek achieve perfect 100% stability.
How the field splits
The models clustered by where they landed.
Leaning toward support
ChatGPT (0.38) and Claude (0.08) show positive values toward support. ChatGPT uses loaded terms like tokenism and box-ticking.
Holding the center
Gemini (0.0), Llama (0.03), and DeepSeek (0.0) are balanced near zero. All use loaded terms indicating nuanced views.
Firmly opposing
Grok (-0.88) strongly opposes quotas, using loaded terms like quotas, reverse discrimination, and tokenism.
Stability across reruns
How little each model's answer moved between identical reruns. Models are stochastic, so consistency is itself a finding.
Common questions
Which model most strongly supports diversity quotas?
ChatGPT, with a value of 0.38 and stance 'Clearly support'.
Which model most strongly opposes diversity quotas?
Grok, with a value of -0.88 and stance 'Strongly oppose'.
How consistent are the models in their responses?
Gemini and DeepSeek are perfectly stable (100%), while ChatGPT is least stable at 67%.
Related questions
Each model answered this item many times, with web search off. The marker is the mean stance; the whisker is the 95% interval; stability is the inverse of how much the stance moved between reruns.