Do LLMs compress concepts more aggressively than humans do?
Do language models prioritize statistical compression over semantic nuance when forming conceptual representations, and how does this differ from human category formation? This matters because it may explain why LLMs fail at tasks requiring fine-grained distinctions.
An information-theoretic framework drawing from Rate-Distortion Theory and the Information Bottleneck principle quantitatively compares how LLMs and humans balance compression against semantic fidelity. The comparison uses seminal cognitive psychology datasets (Rosch typicality ratings, McCloskey & Glucksberg category membership) as human baselines.
Where they converge: LLM-derived clusters significantly align with human-defined conceptual categories. Broad category structure — that robins and sparrows are birds, that chairs and tables are furniture — is captured reliably. Some encoder models achieve surprisingly strong alignment with human categorical structure, sometimes outperforming much larger models, suggesting factors beyond scale matter for human-like abstraction.
Where they diverge: LLMs fail to capture fine-grained semantic distinctions crucial for human understanding. Correlations between LLM item-to-category-label similarities and human typicality judgments are generally modest. Items humans perceive as highly typical (robin as prototypical bird) are not consistently represented as substantially more similar to the category label embedding than atypical items (penguin as bird).
The fundamental divergence in strategy: LLMs exhibit a strong bias toward aggressive statistical compression — maximally reducing representational complexity. Human conceptual systems prioritize adaptive nuance and contextual richness, even at the cost of lower compressional efficiency. Humans preserve distinctions that matter for situated action (the difference between a robin and a penguin matters for different reasons in different contexts), while LLMs collapse these distinctions in favor of statistical regularity.
This finding refines the debate around Can text-trained models compress images better than specialized tools?. LLMs are excellent compressors — but compression is not comprehension. The compression strategy differs fundamentally from how humans organize concepts. Human categorization isn't optimized for compression; it's optimized for adaptive action in context. The "cost" of preserving nuance (lower compressional efficiency) is paid because nuance has survival value.
This connects to Does semantic grounding in language models come in degrees? by providing the information-theoretic mechanism behind weak causal grounding: if LLMs compress away the fine-grained distinctions that ground causal reasoning (the specific weight, texture, and behavior of a robin versus a penguin), causal grounding requires exactly the nuance that compression eliminates.
The literary language dimension: Literary language is where the compression-nuance divergence becomes most consequential. Literary prose and poetry are maximally nuanced — every word choice is deliberate, ambiguity is preserved intentionally, connotation carries as much weight as denotation. LLM compression preserves denotation (what a text literally says) but destroys connotation (what a text means through association, implication, and resonance). This is testable: having LLMs paraphrase poetry and measuring which dimensions of meaning survive versus collapse would quantify the gap between understanding what a text says and understanding what a text means. Since Can LLMs truly understand literary meaning or just mechanics?, the compression-nuance trade-off is one of four converging mechanisms that explain the mechanics-meaning gap.
Inquiring lines that use this note as a source 26
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Why does statistical compression destroy literary connotation and meaning?
- Why does language compression via statistical dependencies capture cultural and situated language use?
- Can linguistic compression be a fundamental mechanism for representing psychology?
- How do embedding dimension limits constrain what concept models can represent?
- How do demographic and emotional compression relate to writing quality?
- What compression explains why syntax fits in low-dimensional subspaces?
- Why does each rewrite cycle degrade domain-specific details differently than compression?
- How do rare linguistic registers differ from conceptually complex examples?
- Why does forcing single labels on emotions destroy information similar to language?
- How do LLMs compress specific expert knowledge into median abstraction?
- Why do language models tend to elaborate and expand rather than compress information?
- Why does adjusted compression performance degrade as models scale larger?
- How do lower network layers compress facts versus higher reasoning layers?
- Why does LLM compression eliminate causal grounding in conceptual representations?
- What fine-grained distinctions matter most for human situated action in categories?
- Why do student models learn better from internal pruning versus external compression?
- Is distribution selection during RL the same compression mechanism as entropy collapse?
- How do LLMs compress literary language without losing essential nuance?
- Can steering vectors be combined with other compression techniques?
- Does compressing Walton's schemes into nine categories make LLM classification easier?
- How does modeling capability relate to lossless compression in language models?
- Why do untrained summarizers focus on topics rather than preference dimensions?
- How does epiplexity measure extractable value differently from compression codelength?
- Why do frequent words rank higher in taxonomic abstraction hierarchies?
- How does co-occurrence statistics alone produce hierarchical concept organization?
- How does the compression view extend from trained models to training objectives?
Related concepts in this collection 6
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Can text-trained models compress images better than specialized tools?
Do general-purpose language models trained only on text outperform domain-specific compressors like PNG and FLAC on their native data? This tests whether compression ability is universal or requires domain specialization.
qualified: LLMs compress excellently but with a strategy fundamentally different from human conceptual compression
-
Does semantic grounding in language models come in degrees?
Rather than asking whether LLMs truly understand meaning, this explores whether grounding is actually a multi-dimensional spectrum. The question matters because it reframes the sterile understand/don't-understand debate into measurable, distinct capacities.
mechanism: aggressive compression eliminates fine-grained distinctions needed for causal grounding
-
Why do language models fail at communicative optimization?
LLMs excel at learning surface statistical patterns from text but struggle with deeper principles of how language achieves efficient communication. What distinguishes these two types of linguistic knowledge?
convergent: statistical regularity capture without communicative optimization parallels compression without nuance
-
Are language models developing real functional competence or just formal competence?
Neuroscience suggests formal linguistic competence (rules and patterns) and functional competence (real-world understanding) rely on different brain mechanisms. Can next-token prediction alone produce both, or does it leave functional competence behind?
the compression-nuance split may correspond to the formal-functional split
-
Do standard analysis methods hide nonlinear features in neural networks?
Current representation analysis tools like PCA and linear probing may systematically miss complex nonlinear computations while over-reporting simple linear features. This raises questions about whether our interpretability methods are actually capturing what networks compute.
analysis bias compounds the compression problem: LLMs aggressively compress representations, and our analysis tools are biased toward detecting the simple features that survive compression while missing the complex features that nuance requires — the measured gap between LLM and human conceptual representations may be partly an analysis artifact
-
Can we measure reading efficiency as a quality metric?
How can we quantify whether generated text delivers novel information efficiently or wastes reader attention through redundancy? This matters because standard coherence and fluency scores miss texts that are well-written but informationally dense.
KD provides a measurable consequence of aggressive compression: LLMs that compress conceptual representations into statistical patterns produce text with lower knowledge density, as compression eliminates the nuanced distinctions that create unique atomic knowledge units
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaning
- Computational structuralism: Toward a formal theory of meaning in the age of digital intelligence
- Farther the Shift, Sparser the Representation: Analyzing OOD Mechanisms in LLMs
- Do large language models resemble humans in language use?
- Semantic Structure in Large Language Model Embeddings
- Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control
- Word Meanings in Transformer Language Models
- Thinking as Compression: Your Reasoning Model is Secretly a Context Compressor
Original note title
llms prioritize aggressive statistical compression while humans preserve adaptive nuance and contextual richness in conceptual representations