Does high-frequency text homogenize user input before generation?

Does Adam's Law reveal how LLMs flatten distinctive user voices at the parsing stage, not just in output? This matters because it could explain why model accuracy and generic responses emerge from the same mechanism.

Synthesis note · 2026-05-02 · sourced from Natural Language Inference

Adam's Law surfaces a tension that earlier homogenization research could not localize. Do different AI models actually produce diverse outputs? documents output convergence; How much of the internet is AI-generated now? tracks that convergence at internet scale; Do LLMs compress concepts more aggressively than humans do? describes the representational mechanism. What was missing was an input-side account: how distinct user voices get flattened before the model starts generating.

Adam's Law supplies it. The model prefers high-frequency surface forms at the comprehension stage. Users iteratively rephrase their prompts toward higher quality, which empirically means toward higher frequency, which means toward median register. Distinct prompts — a domain expert's specialized phrasing, a regional dialect, a technical idiolect — get pre-processed by the user's own paraphrasing toward whatever phrasing the model handles best, which is whatever phrasing the corpus contained most. Homogenization happens in the parsing of the request, not just in the generation of the response.

The tension is sharp: the same property that gives LLMs their accuracy on common tasks — strong modeling of dense distributional regions — is the property that filters out distinctiveness on the input side. There is no cheap fix because the mechanism is constitutive of how the model works, not a bug in a post-processing layer. Tokenization-of-intelligence, in this frame, is tokenization toward the corpus mean; the input channel and the output channel both narrow toward the high-frequency centroid. A user with a distinctive voice trying to use the model effectively is in an asymmetric trade: speak distinctively and lose accuracy, or speak in the model's preferred register and lose voice. There is no third option that the architecture provides.

Inquiring lines that read this note 16

This note is a source for these research framings, grouped by the broader line of inquiry each explores. Scan the bold lines of inquiry; follow any specific question forward.

How does AI-generated content transformation affect public discourse quality?

Does AI text rewriting systematically distort writer intent and preference?

Does homogenization at the text level cause homogenization of perceived authors?

How can language models sustain linguistic synchrony and intersubjectivity during dialogue?

What role does compression play in language model capability and generalization?

Why does statistical compression destroy literary connotation and meaning?

What factors beyond surface content determine how readers extract meaning differently?

When does optimizing for quality undermine the value of diversity?

How does tokenization toward corpus mean affect downstream output diversity?

Do autonomous architecture discoveries follow predictable scaling laws?

What makes output convergence across models inevitable given input-side homogenization?

Can prompting inject entirely new knowledge into language models?

Can distinctive input voices maintain accuracy without adopting the model's preferred register?

How can identical external performance mask different internal representations?

How do power-law distributions differ from uniform collision assumptions?

What articulatory information do speech signals carry that text cannot?

What structural factors drive popularity bias in recommendation systems?

How does uniform code distribution make items more distinguishable?

Can AI-generated outputs constitute genuine knowledge or valid claims?

What happens when all models in a society respond identically to queries?

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

13 direct connections · 98 in 2-hop network ·medium cluster Open in graph ↗

Does high-frequency text homogenize user input b… Do different AI models actually produce diverse ou… How much of the internet is AI-generated now? Do LLMs compress concepts more aggressively than h…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Does high-frequency text homogenize user input before generation?

Inquiring lines that read this note 16

Related concepts in this collection 3

Related papers in this collection 8

Search by related questions 4