INQUIRING LINE

Inquiring lines›How should agents manage and coord…›How effectively can inference-time…›How do neural networks separate fa…›this inquiring line

An AI trained to reason better can get worse at medicine — because reasoning and knowledge live in different parts of the model.

How does cross-domain reasoning transfer differ from domain-specific knowledge transfer?

This explores the difference between transferring *reasoning skills* that carry across domains (the how-to-think layer) and transferring *domain-specific knowledge* that's tied to particular facts and fields (the what-you-know layer) — and why the corpus treats these as two different machines with two different failure modes.

This explores the difference between transferring reasoning skills that travel across domains and transferring domain-specific knowledge bound to particular facts. The corpus draws a surprisingly clean line between the two — and the cleanest place to see it is a finding that they physically live in different parts of the network. One analysis locates knowledge retrieval in the *lower* layers of an LLM and reasoning adjustment in the *higher* layers Why does reasoning training help math but hurt medical tasks?. That separation isn't just architectural trivia: it explains why training a model harder on reasoning improves math but can actively *degrade* knowledge-heavy fields like medicine. Reasoning and knowledge are not the same substance, so improving one can erode the other.

The deeper reason they behave differently shows up in what each one depends on. Reasoning generalizes because it rides on *procedural* knowledge — broad, transferable patterns drawn from many diverse documents — whereas factual recall depends on narrow, document-specific memorization of the exact target fact Does procedural knowledge drive reasoning more than factual retrieval?. That's the mechanism behind cross-domain transfer: a procedure learned in one place is reusable elsewhere, but a memorized fact is not. The same intuition powers the finding that reconstructing experts' *hidden* thought processes — the self-talk, recall, and verification beneath a finished expert text — produces reasoning that transfers across domains, while training only on the polished surface text does not Can reconstructing expert thinking improve reasoning transfer?. You transfer reasoning by capturing process; you transfer knowledge by capturing content.

But cross-domain reasoning transfer has a hard limit the corpus is blunt about: it's bounded by the training distribution. Chain-of-thought reasoning degrades *predictably* under shifts in task, length, or format — models keep producing fluent reasoning that has quietly stopped being valid Does chain-of-thought reasoning actually generalize beyond training data?. So 'transferable' reasoning is really 'transferable within a neighborhood,' not unbounded. And on the knowledge side there's an even harder wall: prompting can only *activate* knowledge already in the model — no prompt strategy injects facts the model never learned Can prompt optimization teach models knowledge they lack?. Knowledge transfer therefore demands actually changing the weights; reasoning transfer can sometimes be coaxed out of what's latent.

This is why the two need different *methods* to install them. For domain knowledge, the corpus favors approaches that restructure what the model holds: knowledge-graph curricula that compose primitives into domain expertise Can knowledge graphs teach models deep domain expertise?, and RL-from-augmented-generation that internalizes coherent knowledge structures better than supervised fine-tuning because it rewards rational explanation, not just token-correct answers Can reinforcement learning embed domain knowledge more effectively than supervised fine-tuning?. For reasoning, the corpus leans on emergence: complex domain reasoning arises from RL on hard problems with only simple accuracy rewards Can simple rewards alone teach complex domain reasoning?, and reasoning competence can even emerge as a side effect of ordinary language modeling when models learn to generate rationales token-by-token on arbitrary text Can models learn reasoning from predicting any text?.

The payoff for a curious reader is the catch the corpus keeps circling back to: these two kinds of transfer *interfere* with each other. Every domain-adaptation method has a domain-conditional sweet spot, and the visible win — a benchmark bump — often hides degradation in reasoning faithfulness, capability transfer, or format flexibility How do domain training techniques actually reshape model behavior?. So the real question isn't 'reasoning transfer *or* knowledge transfer' — it's that pushing hard on either one tends to quietly tax the other, and most training recipes are silent about which side they're spending down.

Sources 10 notes

Why does reasoning training help math but hurt medical tasks?

Two-phase inference model shows knowledge retrieval operates in lower network layers while reasoning adjustment happens in higher layers. This separation explains why reasoning training improves math but can degrade knowledge-intensive domains like medicine.

Does procedural knowledge drive reasoning more than factual retrieval?

Analysis of 5 million pretraining documents shows reasoning relies on broad, transferable procedural knowledge from diverse sources, unlike factual recall which depends on narrow, document-specific memorization of target facts.

Can reconstructing expert thinking improve reasoning transfer?

Training on expert texts augmented with reconstructed thought processes (self-talk, knowledge recall, verification) produces reasoning skills that transfer across domains and adapt depth to problem difficulty, outperforming standard continual pretraining by up to 8 points on hard problems.

Does chain-of-thought reasoning actually generalize beyond training data?

DataAlchemy experiments show CoT fails systematically under distributional shifts in task, length, and format. Models produce fluent but logically inconsistent reasoning — imitating reasoning form without valid underlying logic.

Can prompt optimization teach models knowledge they lack?

Prompting works entirely within a model's pre-existing training distribution and cannot supply domain knowledge absent from training data. This creates a hard ceiling: no prompt strategy can compensate for missing foundational knowledge, only reorganize what already exists.

Show all 10 sources

Can knowledge graphs teach models deep domain expertise?

Fine-tuning a 32B model on 24,000 reasoning tasks derived from medical knowledge graph paths produces state-of-the-art performance across 15 medical domains, demonstrating that structured knowledge composition matters more than scale.

Can reinforcement learning embed domain knowledge more effectively than supervised fine-tuning?

RLAG rewards both answer accuracy and explanation rationality by cycling between augmented and unaugmented generation, progressively internalizing coherent knowledge structures. This outperforms SFT because it prioritizes reasoning quality over token-level correctness.

Can simple rewards alone teach complex domain reasoning?

Medical AI systems and o3 demonstrate that sophisticated domain reasoning emerges naturally from RL training on difficult problems with only basic accuracy signals, without requiring explicit chain-of-thought distillation from teacher models.

Can models learn reasoning from predicting any text?

Quiet-STaR trains language models to generate rationales at every token position during pretraining on arbitrary internet text, enabling general reasoning without task-specific datasets. Rationale quality is judged by predictive accuracy rather than labeled correctness, allowing reasoning competence to emerge as a side effect of improved language modeling.

How do domain training techniques actually reshape model behavior?

Research shows every adaptation method—from parameter-efficient tuning to knowledge graph curricula—has optimal conditions tied to specific domains. The key finding: visible benefits like performance gains often come with hidden degradation in reasoning faithfulness, capability transfer, and format flexibility.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a researcher evaluating claims about knowledge transfer vs. reasoning transfer in LLMs. The question remains: do cross-domain reasoning and domain-specific knowledge truly transfer via different mechanisms, and do they genuinely interfere?

What a curated library found — and when (dated claims, not current truth):
Findings span 2023–2025; treat as snapshot, not current state:

• Knowledge resides in lower network layers; reasoning adjustment in higher layers, suggesting physical separation that enables reasoning transfer but not knowledge transfer (~2025, arXiv:2507.18178).
• Procedural knowledge (broad patterns across documents) drives reasoning generalization; memorized facts do not (~2024-11, arXiv:2411.12580).
• Chain-of-thought reasoning degrades predictably under distribution shifts (task, length, format); 'transferable' reasoning is bounded to near-training neighborhoods (~2025-08, arXiv:2508.01191).
• Prompting cannot inject new knowledge — only activate latent knowledge already in weights; knowledge transfer requires weight updates (~2025-02, arXiv:2502.10708).
• RL-from-augmented-generation embeds domain knowledge more effectively than SFT; RL on hard problems produces emergent domain reasoning (~2025-09, arXiv:2509.20162).

Anchor papers (verify; mind their dates):
• arXiv:2411.12580 (2024-11): Procedural Knowledge in Pretraining Drives Reasoning
• arXiv:2507.18178 (2025-07): Decoupling Knowledge and Reasoning (Cognitive Dual-System)
• arXiv:2508.01191 (2025-08): Chain-of-Thought Reasoning Distribution Lens
• arXiv:2509.20162 (2025-09): RL from Augmented Generation for Domain Knowledge

Your task:
(1) RE-TEST THE INTERFERENCE CLAIM. For each constraint (layer separation, distribution boundedness, prompt-knowledge limit), assess whether recent model scaling, new RL variants (process reward, outcome reward hybrids), or test-time compute (tree search, ensemble reasoning) have relaxed it. Separate durable mechanism (procedural vs. memorized knowledge) from perishable limitation (whether RL truly resolves the trade-off).
(2) Surface work from the last 6 months that CONTRADICTS the bidirectional interference finding — i.e., cases where knowledge and reasoning co-improve, or where one transfer method benefits both.
(3) Propose 2 new research questions that assume: (a) the layer separation may be less rigid in newer architectures; (b) the interference may be tunable via curriculum or multi-objective training.

Cite arXiv IDs; flag anything you cannot ground in a real paper.

An AI trained to reason better can get worse at medicine — because reasoning and knowledge live in different parts of the model.

Related lines of inquiry

Sources 10 notes

Papers this line draws on 8