INQUIRING LINE

How do LLMs compress specific expert knowledge into median abstraction?

This explores what happens when an LLM flattens the precise, situation-specific knowledge an expert carries into a smoothed-out 'average' version — and why that flattening seems to be baked into how these models work.


This explores what happens when an LLM takes the precise, hard-won distinctions an expert holds and squeezes them into a smoothed-out average — and the corpus suggests this isn't a bug you can tune away, it's the objective the model is quietly optimizing for. The clearest evidence comes from work measuring LLMs against human concept representations through the lens of rate-distortion theory: models capture the broad shape of a category but discard the fine-grained distinctions humans preserve, because they're maximizing compression efficiency where humans trade some efficiency for contextual meaning that lets them act in a specific situation Do LLMs compress concepts more aggressively than humans do?. That trade-off is the whole story in miniature: expert knowledge *is* the fine-grained distinctions, and aggressive compression is exactly what erases them.

Why can't the model just hold onto the specifics? Part of the answer is that LLMs reason by semantic association, not symbolic rule-following. When researchers strip the familiar surface meaning away from a task — leaving the logic intact but the words unfamiliar — performance collapses, revealing that the model leans on commonsense token patterns drawn from its training distribution rather than manipulating the actual structure Do large language models reason symbolically or semantically?. Expert knowledge often lives precisely in the exceptions to commonsense, so a system that defaults to the statistical center of the distribution will reliably regress toward the median.

This produces a strange failure signature: a model can give a textbook-correct *explanation* of an expert concept and then fail to *apply* it, sometimes even recognizing its own failure — a pattern that doesn't look like a human knowledge gap at all Can LLMs understand concepts they cannot apply?. The explanation survives compression because it's the high-frequency, well-rehearsed surface; the application requires the buried specifics that got smoothed away. The same decoupling shows up structurally: models reaching identical outputs can carry radically different internal representations, so behavioral fluency tells you little about whether the underlying expert structure is actually there What actually happens inside a language model? What actually happens inside the minds of language models?.

You can watch this median-pull bite hardest where there's no commonsense shortcut to lean on. On genuine constrained-optimization problems LLMs plateau around 55–60% regardless of scale, and on iterative numerical methods they pattern-match a memorized template and emit plausible-but-wrong values instead of actually executing the procedure Do larger language models solve constrained optimization better? Do large language models actually perform iterative optimization?. Even reinforcement fine-tuning doesn't reverse this — it tends to *sharpen* the template-matching rather than install the missing procedure, with sharp drops on slightly out-of-distribution variants Do fine-tuned language models actually learn optimization procedures?.

The more useful turn here is that the corpus also points to a workaround: if the model won't preserve specificity internally, you can carry it externally. Embedding the LLM inside an explicit algorithm that feeds it only step-relevant context, or externalizing reasoning into knowledge-graph triples it builds and checks as it goes, lets even small models hold onto distinctions they'd otherwise average away — KGoT gets a 29% lift on hard tasks using a mini model Can algorithms control LLM reasoning better than LLMs alone? Can structuring reasoning as knowledge graphs help smaller models solve complex tasks?. The thing you didn't expect to learn: the cure for median abstraction isn't a bigger model, it's a scaffold that keeps the specifics outside the compression.


Sources 10 notes

Do LLMs compress concepts more aggressively than humans do?

Using Rate-Distortion Theory on cognitive datasets, LLMs capture broad category structure but lose fine-grained distinctions humans preserve. LLMs maximize compression efficiency; humans trade compression for contextual meaning that enables situated action.

Do large language models reason symbolically or semantically?

When semantic content is decoupled from reasoning tasks, LLM performance collapses even with correct rules in context. Models rely on parametric commonsense and token associations rather than formal logical manipulation, constraining reasoning to training distribution semantics.

Can LLMs understand concepts they cannot apply?

Models can explain concepts accurately, fail to apply them, and recognize the failure—a triple pattern incompatible with human cognition. This indicates functionally disconnected explanation and execution pathways rather than simple knowledge gaps.

What actually happens inside a language model?

Research shows that LLMs can achieve the same output through different internal mechanisms, and improvements in one dimension like accuracy reliably degrade others like faithfulness and calibration. Internal structure matters even when behavior appears identical.

What actually happens inside the minds of language models?

LLMs can achieve identical accuracy while maintaining radically different internal representations, and mechanisms that appear interpretable may not causally drive outputs. This decoupling means performance metrics alone mask crucial differences in how models actually work.

Do larger language models solve constrained optimization better?

Across constrained-optimization tasks, LLMs converge to ~55–60% constraint satisfaction independent of architecture, parameter count, or training regime. Reasoning models do not systematically outperform standard models, suggesting a fundamental ceiling rather than a scaling gap.

Do large language models actually perform iterative optimization?

Research shows LLMs cannot perform iterative procedures in latent space. They recognize optimization problems as template-similar and emit plausible-looking but incorrect values, a failure mode that persists across model scale and training approaches.

Do fine-tuned language models actually learn optimization procedures?

Even GRPO-trained models show sharp performance drops on out-of-distribution variants (N-1 test sets) compared to in-distribution problems, indicating RL optimizes template-matching rather than genuine problem-solving procedures.

Can algorithms control LLM reasoning better than LLMs alone?

LLM Programs embed LLMs within explicit algorithms that manage control flow and state, presenting only step-specific context to each LLM call. This information hiding addresses capability and context window limits while treating complex reasoning as modular, debuggable sub-tasks.

Can structuring reasoning as knowledge graphs help smaller models solve complex tasks?

Knowledge Graph of Thoughts (KGoT) achieves 29% improvement on GAIA Level 3 tasks using GPT-4o mini by externalizing reasoning into iteratively constructed KG triples. The approach improves transparency, reduces bias, and enables quality control over reasoning steps.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-testing claims about LLM knowledge compression against the current frontier. The question remains open: *Do LLMs structurally compress expert knowledge into median abstraction, and can this be reversed or circumvented?*

What a curated library found — and when (dated claims, not current truth): Findings span 2023–2026.
• LLMs reason by semantic association, not symbolic rule-following; strip surface meaning and performance collapses, revealing reliance on token patterns over actual structure (2023).
• Models achieve textbook-correct *explanation* but fail *application* of expert concepts — a decoupling signature distinct from human knowledge gaps (2025–2026).
• On constrained optimization, LLMs plateau ~55–60% regardless of scale; on iterative methods, they pattern-match memorized templates instead of executing procedures (2025–2026).
• RL fine-tuning sharpens template-matching rather than installing missing procedures; out-of-distribution tests show sharp drops (2025).
• Externalizing reasoning into knowledge graphs or algorithm scaffolds (e.g., KGoT) yields 29% lifts on hard tasks, keeping specifics outside the model's compression (2025).

Anchor papers (verify; mind their dates):
• arXiv:2305.14825 (2023) — semantic vs. symbolic reasoning
• arXiv:2505.17117 (2025) — compression vs. meaning trade-off
• arXiv:2507.10624 (2025) — architectural limits in symbolic computation
• arXiv:2504.02670 (2025) — knowledge graph externalisation

Your task:
(1) RE-TEST EACH CONSTRAINT. For the semantic-association thesis, the explanation–application decoupling, and the 55–60% plateau: judge whether newer models (o1, claude-opus, deeper reasoning chains), in-context tools (in-place prompting, chain-of-thought variants), or orchestration (agentic RAG, multi-step scaffolds) have since RELAXED these limits. Surface which constraints still hold and which have shifted; cite the mechanism of any relaxation.
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months. Look for papers claiming LLMs *can* preserve expert specificity under certain conditions, or showing that reasoning-focused architectures bypass median pull.
(3) Propose 2 research questions that ASSUME the regime may have moved: e.g., "If externalisation solves compression at inference, does training on externally-scaffolded reasoning change pretraining dynamics?" or "Do reasoning-optimized LLMs show *different* compression signatures than base models?"

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines