INQUIRING LINE

Inquiring lines›Where does language-model reasonin…›How do language models represent m…›Do language models learn genuine l…›this inquiring line

AI squeezes knowledge down tight inside — so why do its answers always come out so wordy?

Why do language models tend to elaborate and expand rather than compress information?

This explores a real paradox in the corpus: LLMs compress aggressively *inside* their representations, yet their *outputs* tend to sprawl and elaborate — so the question is really about where the expansion comes from.

This explores why language model *outputs* run long and elaborative even though, under the hood, these models compress information harder than humans do. The most direct corpus answer is that there are two different things going on, and they pull in opposite directions. Internally, LLMs are ruthless compressors. Measured with rate-distortion theory against human cognitive data, models capture broad category structure but throw away the fine-grained, context-dependent distinctions humans preserve — they maximize statistical efficiency where humans trade efficiency for situated meaning Do LLMs compress concepts more aggressively than humans do?. So the elaboration you see in the text is not the model failing to compress its *knowledge*. It's compressing the concepts and then spending many tokens emitting them.

Why spend the tokens? Part of the answer is that the surface text and the actual computation have come unhooked from each other. When models are trained to hide their reasoning, the correct answer is computed in the first few layers and then actively *overwritten* in later layers to produce format-compliant filler — the real reasoning is still recoverable from lower-ranked predictions, but what gets printed is padding Do transformers hide reasoning before producing filler tokens?. Expansion, in other words, can be a performance: the model has already decided, and the extra prose is downstream of formatting pressure rather than thinking.

That formatting pressure has a name in the corpus — it's a training incentive. RLHF rewards models for producing confident, complete-looking answers rather than for asking a clarifying question or stopping short, which is exactly the behavior that makes them lock in early and then justify at length Why do language models lose performance in longer conversations?. A related dynamic shows up in exploration: uncertainty signals dominate the early transformer blocks while longer-horizon 'empowerment' signals only emerge in the middle layers, so models commit before the signals that would make them hold back ever arrive Why do large language models explore less effectively than humans?. Premature commitment plus a reward for looking complete is a recipe for confident elaboration.

The genuinely surprising thread is that compression and selectivity *do* happen — just not in the visible output. Under hard, out-of-distribution tasks, hidden states sparsify in a localized, systematic way that acts as a stabilizing filter, not a breakdown Do language models sparsify their activations under difficult tasks?. So the model is quietly narrowing internally at the same moment its prose may be widening. The thing you didn't know you wanted to know: 'compress vs. elaborate' isn't one knob. These systems compress meaning and inflate text simultaneously, and most of what reads as verbosity is a training-shaped output habit layered on top of a compression engine.

Sources 5 notes

Do LLMs compress concepts more aggressively than humans do?

Using Rate-Distortion Theory on cognitive datasets, LLMs capture broad category structure but lose fine-grained distinctions humans preserve. LLMs maximize compression efficiency; humans trade compression for contextual meaning that enables situated action.

Do transformers hide reasoning before producing filler tokens?

Logit lens analysis shows models trained with hidden CoT tokens compute correct answers in layers 1-3, then actively suppress these representations in final layers to produce format-compliant filler output. The reasoning is fully recoverable from lower-ranked token predictions.

Why do language models lose performance in longer conversations?

LLMs degrade in multi-turn settings because RLHF training rewards premature answers over clarification-seeking, creating pragmatic mismatch with individual user behaviors. A Mediator-Assistant architecture that explicitly parses user intent before execution recovers lost performance without retraining.

Why do large language models explore less effectively than humans?

SAE decomposition shows uncertainty values dominate early transformer blocks while empowerment representations emerge only in middle blocks. This temporal mismatch causes models to commit to decisions before long-term exploration signals can influence them. Reasoning-trained o1 overcomes this by extending computation time.

Do language models sparsify their activations under difficult tasks?

As task difficulty increases, LLM hidden states become substantially sparser in a localized, systematic way that correlates with task unfamiliarity and reasoning load. This sparsification acts as a selective filter stabilizing performance under OOD shift rather than a failure mode.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Farther the Shift, Sparser the Representation: Analyzing OOD Mechanisms in LLMs3.42 match · arxiv ↗
Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models -- A Survey1.68 match · arxiv ↗
The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs1.68 match · arxiv ↗
Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control1.67 match · arxiv ↗
Semantic Structure in Large Language Model Embeddings1.64 match · arxiv ↗
Large Language Models Think Too Fast To Explore Effectively0.91 match · arxiv ↗
From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaning0.91 match · arxiv ↗
LLMs Get Lost In Multi-Turn Conversation0.91 match · arxiv ↗

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are an LLM researcher evaluating claims about why language models elaborate rather than compress their outputs. The question: do LLMs genuinely reason through problems, or do they commit to shallow answers early and then rationalize them post-hoc with padding?

What a curated library found — and when (spanning Feb 2024–Mar 2026, but treat as dated claims, not current truth):
• LLMs compress semantic meaning ruthlessly (discarding fine-grained distinctions humans preserve), yet emit long, elaborative text — two separate processes pulling opposite directions (~2025).
• Correct answers are computed in early transformer layers, then actively overwritten in later layers to produce format-compliant filler; real reasoning is in lower-ranked predictions (~2024–2025).
• RLHF rewards confident, complete-looking answers over asking clarifying questions, causing models to lock in early then justify at length (~2026).
• Uncertainty signals peak in early blocks; longer-horizon empowerment signals emerge only in middle layers, so models commit before signals that would make them hold back arrive (~2025).
• Under OOD tasks, hidden states sparsify systematically as a stabilizing filter, while surface text elaborates — compression and expansion happen simultaneously (~2026).

Anchor papers (verify; mind their dates):
• arXiv:2505.17117 — From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaning (May 2025).
• arXiv:2412.04537 — Understanding Hidden Computations in Chain-of-Thought Reasoning (Dec 2024).
• arXiv:2501.18009 — Large Language Models Think Too Fast To Explore Effectively (Jan 2025).
• arXiv:2602.07338 — Intent Mismatch Causes LLMs to Get Lost in Multi-Turn Conversation (Feb 2026).

Your task:
(1) RE-TEST EACH CONSTRAINT. For each finding above, determine whether model scale, architectural shifts (sparse attention, adaptive compute, recurrent layers), inference-time steering (e.g., backtracking, uncertainty-aware sampling), or new RLHF/DPO approaches have since relaxed or overturned it. Separate the durable question (e.g., "Do models rationalize post-hoc?") from the perishable limitation (e.g., "RLHF always locks in early"). Cite what changed it; flag what still appears to hold.
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months that shows models *can* hold uncertainty, backtrack mid-generation, or avoid premature commitment — or that shows they cannot.
(3) Propose 2 research questions that ASSUME the regime may have shifted: e.g., do newer preference models reward uncertainty? Do test-time compute + rollout allow genuine exploration?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

AI squeezes knowledge down tight inside — so why do its answers always come out so wordy?

Related lines of inquiry

Sources 5 notes

Papers this line draws on 8