INQUIRING LINE

Inquiring lines›How should agents manage and coord…›How effectively can inference-time…›How do neural networks separate fa…›this inquiring line

AI models store facts near their input end and do reasoning near the output end — and training one can quietly break the other.

Why do higher network layers capture procedural knowledge but lower layers store facts?

This explores the finding that LLMs seem to split labor by depth — facts retrieved in lower layers, reasoning and procedure assembled in higher ones — and asks why that division shows up.

This explores the finding that LLMs seem to split labor by depth — facts retrieved in lower layers, reasoning and procedure assembled in higher ones — and asks *why* that division shows up. The cleanest statement of the pattern comes from a two-phase inference model: knowledge retrieval operates in lower network layers while reasoning adjustment happens up top Why does reasoning training help math but hurt medical tasks?. The reason this isn't just a curiosity is its practical bite — it explains why training a model harder on reasoning improves math but can quietly *degrade* knowledge-heavy domains like medicine. If the two functions live in different real estate, tuning one can evict the other.

Why would depth organize itself this way? A useful clue is that procedural and factual knowledge are sourced differently during pretraining in the first place. Reasoning leans on broad, transferable procedures pulled from many diverse documents, while factual recall depends on narrow, document-specific memorization of a single target fact Does procedural knowledge drive reasoning more than factual retrieval?. Facts are point lookups; procedures are patterns abstracted across thousands of examples. It makes sense that a lookup resolves early (you either have the entry or you don't) and that the slower work of combining and transforming those entries stacks up afterward — higher layers operate on what lower layers have already surfaced.

The "reasoning happens higher, and late" story gets sharper from interpretability work showing transformers doing real computation early and then *rewriting* it. In models trained with hidden chain-of-thought, the correct answer is computed in layers 1–3 and then actively suppressed in the final layers to emit format-compliant filler Do transformers hide reasoning before producing filler tokens?. That complicates any tidy "facts low, reasoning high" map — the same vertical axis is being used for retrieval, transformation, *and* output shaping, and what a layer is 'for' depends on how the model was trained. A related caution: identical performance can hide radically different internal structures, so a layer-function map that holds for one model may not transfer to another What actually happens inside the minds of language models? What really happens inside a language model?.

The deeper 'why' may be that this layering is the network discovering modularity on its own. Pruning experiments show neural nets naturally decompose tasks into isolated subnetworks, and pretraining makes that modular structure more consistent and reliable Do neural networks naturally learn modular compositional structure?. Separating storage from manipulation is exactly the kind of reusable structure that a compositional system would converge on — a fact retriever you can call from many different procedures is more efficient than re-deriving facts inside every reasoning path.

If you want to go one step laterally: this internal split echoes how brains and hybrid AI systems are organized. One framing maps transformer weights to a 'neocortex' of consolidated knowledge, retrieval systems to hippocampal indexing, and agentic state to prefrontal control Can brain memory systems explain how LLMs should store knowledge?. The recurring lesson across all of these is the same one that makes the original finding matter — knowing *where* a model keeps a capability tells you what you'll break when you train on top of it.

Sources 7 notes

Why does reasoning training help math but hurt medical tasks?

Two-phase inference model shows knowledge retrieval operates in lower network layers while reasoning adjustment happens in higher layers. This separation explains why reasoning training improves math but can degrade knowledge-intensive domains like medicine.

Does procedural knowledge drive reasoning more than factual retrieval?

Analysis of 5 million pretraining documents shows reasoning relies on broad, transferable procedural knowledge from diverse sources, unlike factual recall which depends on narrow, document-specific memorization of target facts.

Do transformers hide reasoning before producing filler tokens?

Logit lens analysis shows models trained with hidden CoT tokens compute correct answers in layers 1-3, then actively suppress these representations in final layers to produce format-compliant filler output. The reasoning is fully recoverable from lower-ranked token predictions.

What actually happens inside the minds of language models?

LLMs can achieve identical accuracy while maintaining radically different internal representations, and mechanisms that appear interpretable may not causally drive outputs. This decoupling means performance metrics alone mask crucial differences in how models actually work.

What really happens inside a language model?

Research into mechanistic interpretability, cognitive models, and training dynamics shows that identical benchmark performance conceals radically different internal structures. Improving one capability (helpfulness, accuracy) reliably degrades others (faithfulness, calibration, diversity).

Show all 7 sources

Do neural networks naturally learn modular compositional structure?

Pruning experiments reveal that neural networks implement compositional subroutines in isolated subnetworks, with ablations affecting only their corresponding function. Pretraining substantially increases the consistency and reliability of this modular structure across architectures and domains.

Can brain memory systems explain how LLMs should store knowledge?

Research shows transformer weights function as a distributed neocortex for consolidated knowledge, RAG stores as hippocampal indexing for rapid encoding, and agentic state as prefrontal executive control. The CLS framework predicts why hybrid systems outperform single-tier approaches and identifies missing consolidation mechanisms that prevent memory integration.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a mechanistic interpretability analyst. The question remains open: *Why do transformer networks localize factual retrieval to lower layers and procedural reasoning to higher ones—and does this division persist across newer model families and training regimes?*

What a curated library found—and when (dated claims, not current truth):
Findings span 2023–2026 and include:
- Factual knowledge retrieval operates in lower layers; reasoning adjustment happens in higher layers, with different training objectives (e.g., hidden chain-of-thought) actively *rewriting* early computations in final layers rather than simply appending to them (~2024–2025).
- Procedural knowledge sourced during pretraining across diverse documents drives generalization; factual recall depends on narrow, document-specific memorization—suggesting lookup resolves early while transformation stacks higher (~2024).
- Neural networks decompose compositional tasks into modular, isolated subnetworks without explicit supervision; pretraining strengthens this modularity (~2023–2024).
- Identical downstream performance can mask radically different internal layer structures across model variants, so a depth-function map may not transfer (~2024–2025).
- Recent agentic and tool-augmented systems (2025–2026) blur the boundary by offloading retrieval to external knowledge graphs and memory systems, potentially reshaping what "lower layer" function means.

Anchor papers (verify; mind their dates):
- arXiv:2411.12580 (2024-11): Procedural Knowledge in Pretraining Drives Reasoning
- arXiv:2412.04537 (2024-12): Understanding Hidden Computations in Chain-of-Thought Reasoning
- arXiv:2507.18178 (2025-07): Decoupling Knowledge and Reasoning in LLMs: Dual-System
- arXiv:2601.09113 (2026-01): The AI Hippocampus: How Far are We From Human Memory?

Your task:
(1) RE-TEST EACH CONSTRAINT. For models trained post-2025 (reasoning-scaled, tool-augmented, or agentic), judge whether the lower=facts, higher=reasoning split still holds or has been relaxed by architectural changes (e.g., retrieval-augmented generation, dynamic routing, or external memory indexing). Distinguish the durable question (why modularity emerges) from perishable limitations (specific to dense transformers without tooling).
(2) Surface the strongest *contradicting* finding: has recent work on in-context learning, mixture-of-experts routing, or agent orchestration *overturned* the depth-function hierarchy, or merely relocated it to system level?
(3) Propose two questions assuming the regime has moved: (a) If lower layers no longer *must* contain facts (via retrieval-augmentation), what functions do they actually compute? (b) Does the modularity principle hold *across* system boundaries—i.e., do agentic systems automatically partition knowledge and reasoning across tools the same way networks do internally?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

AI models store facts near their input end and do reasoning near the output end — and training one can quietly break the other.

Related lines of inquiry

Sources 7 notes

Papers this line draws on 8