Do language models flatten the range of public arguments?
When LLMs write essays on the same topics as humans, do they recover the full spectrum of distinct arguments and reasons people actually make, or do they narrow the deliberative space readers encounter?
Most homogenization studies measure that LLM outputs cluster, but they rarely compare model and human distributions under the same task. The argument-collapse study does exactly that across 195 NYT debates and 61 Boston Review forums against 23,384 LLM essays, and the gap is structural rather than stylistic: 65.3% of human main arguments are unique within a debate versus 3.4% of LLM ones; among essays sharing a main argument, 41% of human sub-arguments are unique versus 9.1% of LLM ones. Prompting for diversity helps but a typical model recovers only about half the distinct human arguments, and the added variation often lands outside the human argument space — so diversity prompting trades coverage for noise rather than filling the long tail.
This sharpens what "diversity collapse" means. Why do LLMs generate novel ideas from narrow ranges? found the same set-level deficit in research ideas; this extends it to public deliberation, where the cost is civic rather than scientific — dominant arguments get amplified and long-tail reasoning disappears from what readers ever see. It also concretizes Do different AI models actually produce diverse outputs? at the granularity of argumentative structure: a fixed arc opening with a direct claim then moving to proposals. And it grounds the macro claim in Does AI homogenize culture the way mass media did? with debate-level evidence.
The honest counterargument, which the authors flag: distinctiveness is not quality. Human arguments are more unique but not necessarily more accurate or persuasive. So the harm is not "LLMs argue worse" but "LLMs flatten the range of arguments in circulation" — an ecology effect that no single output reveals, only the distribution does. That is the right unit of analysis, and the one most homogenization claims skip.
Related concepts in this collection 3
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Why do LLMs generate novel ideas from narrow ranges?
LLM research agents produce individually novel ideas but cluster them in homogeneous sets. This explores why high average novelty coexists with poor diversity coverage and what it means for automated ideation.
extends: same set-level diversity deficit, moved from research ideas to public deliberation
-
Do different AI models actually produce diverse outputs?
Explores whether using multiple different language models together creates genuine diversity or whether shared training and alignment cause them to converge on similar answers despite independence.
exemplifies: convergence now shown at the level of argument structure and supporting reasons
-
Does AI homogenize culture the way mass media did?
If AI generates contextually unique outputs, how can its underlying form be homogeneous? This explores whether AI repeats the culture industry's pattern of suppressing novelty under the guise of variety.
grounds: debate-level evidence for the mass-generated-similar-flows thesis
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Argument Collapse: LLMs Flatten Long-Form Public Debate
- Has the Creativity of Large-Language Models peaked? —an analysis of inter- and intra-LLM variability —
- Unlocking Varied Perspectives: A Persona-Based Multi-Agent Framework with Debate-Driven Text Planning for Argument Generation
- The Thin Line Between Comprehension and Persuasion in LLMs
- Large Language Models are as persuasive as humans, but how? About the cognitive effort and moral-emotional language of LLM arguments
- Can Language Models Recognize Convincing Arguments?
- Computational structuralism: Toward a formal theory of meaning in the age of digital intelligence
- Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models -- A Survey
Original note title
argument collapse measures homogenization where it actually matters — against the human distribution at the level of claims and reasons not just word choice