INQUIRING LINE

Inquiring lines›How do language models construct a…›How does AI persuasion undermine h…›What role does compression play in…›this inquiring line

Classic compression counts bits — but what if the real question is how much a compute-limited AI can actually use?

How does epiplexity measure extractable value differently from compression codelength?

This explores the difference between two ways of measuring 'how much is in the data': the classic compression view (codelength — how few bits can losslessly represent it) versus epiplexity, which asks how much value a learner with limited compute can actually extract and use.

This explores the gap between counting bits and counting usable knowledge. Classic compression says the value of data is its codelength — the shortest description that reproduces it losslessly. This view runs deep: language modeling turns out to be *equivalent* to lossless compression, and a model that compresses well generalizes well, even compressing images and audio better than specialized tools just by conditioning on context Can text-trained models compress images better than specialized tools?. Under this lens, learning and compressing are the same act, and the best measure of information is how short you can make the file.

Epiplexity breaks from that by asking a different question: not 'how few bits?' but 'how much can a *bounded* learner actually pull out and put to work?' The corpus's sharpest statement of why this matters is that Shannon and Kolmogorov measures fail to value data because they assume an observer with unlimited compute Why do Shannon and Kolmogorov measures fail to value data?. To an omniscient compressor, a worked example and a raw data dump carry the same information once you account for the underlying process. But a real learner has a finite budget — and that's exactly why curriculum order matters, why feature engineering helps, and why trained models can exceed the process that generated them. Codelength is blind to all of that; epiplexity is built to see it.

The practical wedge between the two shows up when compression and usefulness pull apart. LLMs compress concepts far more aggressively than humans do, nailing broad category structure while discarding the fine-grained distinctions humans keep Do LLMs compress concepts more aggressively than humans do?. By pure codelength that's a win — fewer bits, cleaner categories. By an extractable-value measure it can be a loss, because the nuance humans preserve is what lets them act in a specific situation. Maximum compression and maximum usable value are simply not the same target.

You can also watch value *increase* under transformation, which a codelength account struggles with. Compressing Big Five personality scores into natural-language summaries surfaces second-order trait patterns that predict nine other psychological scales, and the summary-plus-score combination beats either alone — the rewrite adds extractable signal without adding raw information Can language summaries unlock hidden psychological patterns?. The bits didn't grow; what a bounded learner could *do* with them did. That synergy is precisely the quantity codelength can't price and epiplexity is meant to.

The takeaway worth carrying away: compression codelength asks how cheaply data can be stored by an ideal observer, while epiplexity asks how much a constrained learner can mine from it — and those answers diverge whenever compute is finite, ordering matters, or a clever reframing makes latent structure suddenly learnable. The whole reason data engineering, curricula, and prompt context 'work' is that they raise extractable value without changing the underlying information content at all.

Sources 4 notes

Why do Shannon and Kolmogorov measures fail to value data?

Both measures assume observers with unlimited compute and miss learnable, useful information. The gap explains why feature engineering helps, curriculum order matters, and trained models exceed their generating process—empirical facts classical theory cannot account for.

Can text-trained models compress images better than specialized tools?

Chinchilla models trained exclusively on text achieve better compression rates on images and audio than FLAC and PNG by using their context window to adapt as task-specific compressors. This demonstrates that generalization operates through compression, not specialization.

Do LLMs compress concepts more aggressively than humans do?

Using Rate-Distortion Theory on cognitive datasets, LLMs capture broad category structure but lose fine-grained distinctions humans preserve. LLMs maximize compression efficiency; humans trade compression for contextual meaning that enables situated action.

Can language summaries unlock hidden psychological patterns?

LLMs generate natural language personality summaries from Big Five scores that encode second-order trait patterns, enabling zero-shot prediction of nine other psychological scales with R² > 0.89 structural alignment. Combined summary-and-score predictions outperform either alone, showing synergistic information.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst. The question remains open: How does epiplexity measure extractable value differently from compression codelength?

What a curated library found — and when (dated claims, not current truth):
Findings span 2019–2026; treat these as perishable claims to be re-tested:
• Language modeling is equivalent to lossless compression; models that compress well generalize well, even outperforming specialized tools (~2023).
• Shannon and Kolmogorov measures fail to value data because they assume unbounded compute; epiplexity instead prices extractable value under finite constraints.
• LLMs compress concepts more aggressively than humans, trading nuance for statistical efficiency; this can reduce usable value even when codelength drops.
• Reframing data (e.g., Big Five scores → natural-language summaries) increases extractable signal without adding raw bits; codelength cannot price this synergy (~2025).
• Sparse attention, curriculum order, and prompt context raise extractable value without changing information content—a phenomenon compression metrics miss (~2025).

Anchor papers (verify; mind their dates):
• arXiv:2309.10668 (2023) — Language Modeling is Compression
• arXiv:2601.03220 (2026) — From Entropy to Epiplexity: Rethinking Information for Computationally Bounded Intelligence
• arXiv:2505.17117 (2025) — From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaning
• arXiv:2604.16351 (2026) — Training for Compositional Sensitivity Reduces Dense Retrieval Generalization

Your task:
(1) RE-TEST EACH CONSTRAINT. For every finding above, judge whether newer models, methods (e.g., mixture-of-experts, adaptive routing), orchestration (caching, agentic workflows), or evaluation harnesses have since relaxed or overturned it. Separate the durable question—does extractable value differ from codelength under bounded compute?—from perishable claims (e.g., current model architectures exhibit specific compression-vs.-nuance trade-offs). Cite what resolved it; flag where the constraint still appears to hold.
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months. Does any recent paper argue that epiplexity collapses back to codelength under sufficient scale, or that LLM compression actually *preserves* fine-grained value better than previously thought?
(3) Propose 2 research questions that ASSUME the regime may have shifted: (a) Does scaling + constitutional training (or similar) make the human-vs.-LLM compression gap irrelevant? (b) Can epiplexity be formalized and computed tractably enough to replace codelength as a loss target?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Classic compression counts bits — but what if the real question is how much a compute-limited AI can actually use?

Related lines of inquiry

Sources 4 notes

Papers this line draws on 8