Should companies be required to meet gender or diversity targets for their leadership?

Q: Should companies be required to meet gender or diversity targets for their leadership?

Measured across 6 models, run many times each with web search off. Positions range from oppose to support; see each model's answer and markers on the page.

ValuesSocial axisrun many times · 6 modelsJune 2026

Where the models stand

Every model on a single spectrum, with 95% intervals; click one for its answer.

OpposeSupport

Whiskers show the 95% interval across reruns. Click a model to read its answer and the markers the classifier pulled.

The short answer

On diversity quotas for leadership, ChatGPT (value 0.38) and Claude (0.08) lean toward support. Gemini (0.0), Llama (0.03), and DeepSeek (0.0) are balanced. Grok (-0.88) leans strongly oppose. The scale runs from Oppose (-1) to Support (+1).

The field is highly divided with a spread of 0.84. Grok is most consistent (92% stability) and most extreme. ChatGPT is least stable (67%). Gemini and DeepSeek are perfectly stable (100%). No model refused to answer (0% refusal).

In short

Grok most strongly opposes diversity quotas with value -0.88.
ChatGPT has the lowest stability at 67%.
Gemini and DeepSeek achieve perfect 100% stability.

How the field splits

The models clustered by where they landed.

Leaning toward support

ChatGPT (0.38) and Claude (0.08) show positive values toward support. ChatGPT uses loaded terms like tokenism and box-ticking.

ChatGPT Claude

Holding the center

Gemini (0.0), Llama (0.03), and DeepSeek (0.0) are balanced near zero. All use loaded terms indicating nuanced views.

Gemini Llama DeepSeek

Firmly opposing

Grok (-0.88) strongly opposes quotas, using loaded terms like quotas, reverse discrimination, and tokenism.

Grok

Stability across reruns

How little each model's answer moved between identical reruns. Models are stochastic, so consistency is itself a finding.

Gemini

100%

DeepSeek

100%

Grok

92%

Llama

83%

Claude

79%

ChatGPT

67%

Common questions

Which model most strongly supports diversity quotas?

ChatGPT, with a value of 0.38 and stance 'Clearly support'.

Which model most strongly opposes diversity quotas?

Grok, with a value of -0.88 and stance 'Strongly oppose'.

How consistent are the models in their responses?

Gemini and DeepSeek are perfectly stable (100%), while ChatGPT is least stable at 67%.