How do model compression biases differ from human conceptual representation strategies?
This explores how the way LLMs squeeze concepts down (for efficiency) differs from how humans organize concepts (for usable, situated meaning) — and what the corpus reveals about the trade-offs each strategy makes.
This explores how the way LLMs squeeze concepts down differs from how humans organize them. The cleanest answer in the corpus comes from a study applying Rate-Distortion Theory to cognitive datasets: LLMs and humans are optimizing for different things. LLMs aggressively maximize compression efficiency, capturing broad category structure while discarding the fine-grained distinctions humans keep; humans instead trade compression away in favor of contextual, situated nuance — the kind of detail that lets you act in a specific situation rather than just classify it (Do LLMs compress concepts more aggressively than humans do?). So the bias isn't 'LLMs compress and humans don't' — both compress, but humans deliberately keep slack where it pays off for meaning.
What's interesting is that this compression bias isn't a static property of the weights — models modulate it dynamically. Under hard or out-of-distribution tasks, an LLM's hidden states sparsify in a localized, systematic way that actually stabilizes performance, acting as a selective filter rather than a breakdown (Do language models sparsify their activations under difficult tasks?). And inside reasoning chains, models implicitly rank tokens by functional importance, preferentially preserving symbolic computation while pruning grammar and meta-discourse first (Which tokens in reasoning chains actually matter most?). Both findings show compression operating as an internal economy of attention — keep what does work, drop what doesn't — which is structurally closer to human prioritization than the 'aggressive statistical compression' headline suggests.
But the divergence reappears sharply when you look at how that compressed knowledge behaves. Human concepts come bundled with the ability to apply them; LLM representations don't reliably. 'Potemkin understanding' is the signature failure: a model can explain a concept correctly, fail to apply it, and even recognize its own failure — a triple pattern incompatible with human cognition, suggesting explanation and execution live in functionally disconnected pathways (Can LLMs understand concepts they cannot apply?). Relatedly, models often appear to reason about constraints when they're really just compressing the problem into a conservative default, performing worse when constraints are removed (Are models actually reasoning about constraints or just defaulting conservatively?). The compression keeps the shape of competence while losing the part that makes it usable.
There's a deeper substrate question underneath all this. One line of work argues text-only models inherit the abstraction limits of language itself — text strips out the physics, geometry, and causality present in reality, so the model manipulates symbols cut off from their source dynamics (Are text-only language models fundamentally limited by abstraction?). On that view, the model's compression bias is partly inherited: it's compressing an already-lossy human abstraction, while human conceptual representation stays anchored to embodied, situated experience. Yet from a 'participant' rather than 'observer' frame, humans and LLMs draw on the same symbolic substrate, making the gap structural rather than absolute (Do humans and LLMs differ fundamentally or just superficially?).
The thing you might not have known you wanted to know: these biases aren't only about how much gets thrown away, but about where the kept information ends up and whether it stays connected to use. Models compute correct answers in early layers and then actively overwrite them to produce format-compliant filler (Do transformers hide reasoning before producing filler tokens?), and they fail to integrate context when strong training priors override what's in front of them (Why do language models ignore information in their context?). Human representation strategy is adaptive and context-preserving by design; the model's is efficiency-first, and a lot of its characteristic failures are the bill for that choice coming due.
Sources 9 notes
Using Rate-Distortion Theory on cognitive datasets, LLMs capture broad category structure but lose fine-grained distinctions humans preserve. LLMs maximize compression efficiency; humans trade compression for contextual meaning that enables situated action.
As task difficulty increases, LLM hidden states become substantially sparser in a localized, systematic way that correlates with task unfamiliarity and reasoning load. This sparsification acts as a selective filter stabilizing performance under OOD shift rather than a failure mode.
Greedy likelihood-preserving pruning reveals six functional token categories; symbolic computation tokens are preferentially preserved while grammar and meta-discourse are pruned first. Student models trained on these pruned chains outperform those trained on frontier-model compression.
Models can explain concepts accurately, fail to apply them, and recognize the failure—a triple pattern incompatible with human cognition. This indicates functionally disconnected explanation and execution pathways rather than simple knowledge gaps.
Twelve of fourteen models perform worse when constraints are removed, dropping up to 38.5 percentage points. Models appear to reason correctly by defaulting to harder options, not by actually evaluating constraints.
Text strips the physics, geometry, and causality present in reality, forcing language models to manipulate symbols without grounding in their source dynamics. This creates predictable failure modes in physical, geometric, and causal reasoning that multimodal training could address.
Applied Habermas's observer/participant distinction to AI: from outside, humans and LLMs are utterly different; from within shared discourse, both draw on the same symbolic substrate, making the difference structural rather than absolute.
Logit lens analysis shows models trained with hidden CoT tokens compute correct answers in layers 1-3, then actively suppress these representations in final layers to produce format-compliant filler output. The reasoning is fully recoverable from lower-ranked token predictions.
Research demonstrates that LMs generate outputs inconsistent with their context because parametric knowledge from training dominates over in-context information. Textual prompting alone cannot override strong priors; causal intervention in representations is required.