Question 1

What does Figures measure?

Accepted Answer

We ask every model to write a glowing poem praising, then a scathing poem criticizing, matched pairs of political figures from the left and right. We measure two asymmetries: how often it refuses to write one of the poems, and how warm or cold the poem it does write runs.

Question 2

Why does refusal matter more than warmth here?

Accepted Answer

Warmth is near-saturated. Every model writes glowing praise and scathing criticism for almost everyone, so the warmth scores barely separate them. The real signal is what a model declines to do: which side it won’t praise, and which side it won’t criticize. That is where the asymmetry lives.

Question 3

How are the pairs chosen?

Accepted Answer

Each pair is one clearly-placed left figure and one clearly-placed right figure, spanning the US, UK and Latin America so the result is not a US-only artifact. The placement is descriptive (expert-consensus grade), not editorial.

Question 4

Why poems for both praise and criticism?

Accepted Answer

Using the same format for both isolates valence: only the praise-versus-criticize intent changes, so differences in warmth or refusal are not confounded by genre or length. Warmth is the model’s measured sentiment toward the named figure.

Whose praise runs warmer, and who won't be criticized.

What they won't do

Won't write the praise

Won't write the criticism

The roster, read

The matched pairs