Do LLMs compress concepts more aggressively than humans do?

Do language models prioritize statistical compression over semantic nuance when forming conceptual representations, and how does this differ from human category formation? This matters because it may explain why LLMs fail at tasks requiring fine-grained distinctions.

Synthesis note · 2026-02-23 · sourced from Cognitive Models Latent

An information-theoretic framework drawing from Rate-Distortion Theory and the Information Bottleneck principle quantitatively compares how LLMs and humans balance compression against semantic fidelity. The comparison uses seminal cognitive psychology datasets (Rosch typicality ratings, McCloskey & Glucksberg category membership) as human baselines.

Where they converge: LLM-derived clusters significantly align with human-defined conceptual categories. Broad category structure — that robins and sparrows are birds, that chairs and tables are furniture — is captured reliably. Some encoder models achieve surprisingly strong alignment with human categorical structure, sometimes outperforming much larger models, suggesting factors beyond scale matter for human-like abstraction.

Where they diverge: LLMs fail to capture fine-grained semantic distinctions crucial for human understanding. Correlations between LLM item-to-category-label similarities and human typicality judgments are generally modest. Items humans perceive as highly typical (robin as prototypical bird) are not consistently represented as substantially more similar to the category label embedding than atypical items (penguin as bird).

The fundamental divergence in strategy: LLMs exhibit a strong bias toward aggressive statistical compression — maximally reducing representational complexity. Human conceptual systems prioritize adaptive nuance and contextual richness, even at the cost of lower compressional efficiency. Humans preserve distinctions that matter for situated action (the difference between a robin and a penguin matters for different reasons in different contexts), while LLMs collapse these distinctions in favor of statistical regularity.

This finding refines the debate around Can text-trained models compress images better than specialized tools?. LLMs are excellent compressors — but compression is not comprehension. The compression strategy differs fundamentally from how humans organize concepts. Human categorization isn't optimized for compression; it's optimized for adaptive action in context. The "cost" of preserving nuance (lower compressional efficiency) is paid because nuance has survival value.

This connects to Does semantic grounding in language models come in degrees? by providing the information-theoretic mechanism behind weak causal grounding: if LLMs compress away the fine-grained distinctions that ground causal reasoning (the specific weight, texture, and behavior of a robin versus a penguin), causal grounding requires exactly the nuance that compression eliminates.

The literary language dimension: Literary language is where the compression-nuance divergence becomes most consequential. Literary prose and poetry are maximally nuanced — every word choice is deliberate, ambiguity is preserved intentionally, connotation carries as much weight as denotation. LLM compression preserves denotation (what a text literally says) but destroys connotation (what a text means through association, implication, and resonance). This is testable: having LLMs paraphrase poetry and measuring which dimensions of meaning survive versus collapse would quantify the gap between understanding what a text says and understanding what a text means. Since Can LLMs truly understand literary meaning or just mechanics?, the compression-nuance trade-off is one of four converging mechanisms that explain the mechanics-meaning gap.

Inquiring lines that read this note 33

This note is a source for these research framings, grouped by the broader line of inquiry each explores. Scan the bold lines of inquiry; follow any specific question forward.

What role does compression play in language model capability and generalization?

When does architectural design matter more than raw model capacity?

How do embedding dimension limits constrain what concept models can represent?

Does AI text rewriting systematically distort writer intent and preference?

How do demographic and emotional compression relate to writing quality?

Do language models understand semantics or rely on pattern matching?

How can emotions function as reliable information in reasoning and cognitive systems?

Why does forcing single labels on emotions destroy information similar to language?

How do neural networks separate factual knowledge from reasoning abilities?

How do LLMs compress specific expert knowledge into median abstraction?

Do language models learn genuine linguistic structure or just surface patterns?

Does recurrence enable reasoning capabilities that fixed-depth transformers cannot achieve?

How do lower network layers compress facts versus higher reasoning layers?

How do LLMs distinguish causal reasoning from temporal and semantic associations?

Why does LLM compression eliminate causal grounding in conceptual representations?

Is embodied interaction necessary for language meaning and genuine agency?

What fine-grained distinctions matter most for human situated action in categories?

How does policy entropy collapse constrain reasoning-focused reinforcement learning?

Is distribution selection during RL the same compression mechanism as entropy collapse?

How faithfully do LLMs reflect their actual reasoning in outputs and explanations?

Does compressing Walton's schemes into nine categories make LLM classification easier?

What makes specific clarifying questions more effective than generic ones?

Why do untrained summarizers focus on topics rather than preference dimensions?

Does model scaling alone produce compositional generalization without symbolic mechanisms?

Related concepts in this collection 6

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

21 direct connections · 161 in 2-hop network ·medium cluster Open in graph ↗

Do LLMs compress concepts more aggressively than… Can text-trained models compress images better tha… Does semantic grounding in language models come in… Why do language models fail at communicative optim… Are language models developing real functional com… Do standard analysis methods hide nonlinear featur… Can we measure reading efficiency as a quality met…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Can text-trained models compress images better than specialized tools? Do general-purpose language models trained only on text outperform domain-specific compressors like PNG and FLAC on their native data? This tests whether compression ability is universal or requires domain specialization.
qualified: LLMs compress excellently but with a strategy fundamentally different from human conceptual compression
Does semantic grounding in language models come in degrees? Rather than asking whether LLMs truly understand meaning, this explores whether grounding is actually a multi-dimensional spectrum. The question matters because it reframes the sterile understand/don't-understand debate into measurable, distinct capacities.
mechanism: aggressive compression eliminates fine-grained distinctions needed for causal grounding
Why do language models fail at communicative optimization? LLMs excel at learning surface statistical patterns from text but struggle with deeper principles of how language achieves efficient communication. What distinguishes these two types of linguistic knowledge?
convergent: statistical regularity capture without communicative optimization parallels compression without nuance
Are language models developing real functional competence or just formal competence? Neuroscience suggests formal linguistic competence (rules and patterns) and functional competence (real-world understanding) rely on different brain mechanisms. Can next-token prediction alone produce both, or does it leave functional competence behind?
the compression-nuance split may correspond to the formal-functional split
Do standard analysis methods hide nonlinear features in neural networks? Current representation analysis tools like PCA and linear probing may systematically miss complex nonlinear computations while over-reporting simple linear features. This raises questions about whether our interpretability methods are actually capturing what networks compute.
analysis bias compounds the compression problem: LLMs aggressively compress representations, and our analysis tools are biased toward detecting the simple features that survive compression while missing the complex features that nuance requires — the measured gap between LLM and human conceptual representations may be partly an analysis artifact
Can we measure reading efficiency as a quality metric? How can we quantify whether generated text delivers novel information efficiently or wastes reader attention through redundancy? This matters because standard coherence and fluency scores miss texts that are well-written but informationally dense.
KD provides a measurable consequence of aggressive compression: LLMs that compress conceptual representations into statistical patterns produce text with lower knowledge density, as compression eliminates the nuanced distinctions that create unique atomic knowledge units

Do LLMs compress concepts more aggressively than humans do?

Inquiring lines that read this note 33

Related concepts in this collection 6

Related papers in this collection 8

Search by related questions 4