Does medical AI need knowledge or reasoning more?
Medical and mathematical domains may require fundamentally different AI training priorities. If medical accuracy depends primarily on factual knowledge while math depends on reasoning quality, should we build and evaluate these systems differently?
The KI/InfoGain framework from the Knowledge or Reasoning paper produces a finding that should reshape how domain AI is evaluated and developed: domains differ in the relative importance of knowledge accuracy versus reasoning quality. In medical domains, KI (knowledge correctness) correlates more strongly with final accuracy than InfoGain (reasoning quality) across four of five benchmarks. In mathematical domains, the pattern inverts — reasoning quality matters more than domain knowledge retrieval.
This is not just a curiosity. It has direct implications for which training strategy to prioritize.
Medical AI: knowledge accuracy is the primary driver. The primary risk in medical reasoning is invoking the wrong clinical fact — wrong drug interaction, wrong symptom correlation, wrong diagnostic criterion. A model that reasons well but from incorrect clinical knowledge will reach confidently wrong conclusions. This is why Does RL improve domain reasoning by adding knowledge or removing it? matters specifically in medical contexts — RL's pruning function targets the primary failure mode. And it's why Why doesn't mathematical reasoning transfer to medicine? — mathematical reasoning strength doesn't compensate for clinical knowledge absence.
Mathematical AI: reasoning quality is the primary driver. Mathematical problems are well-defined, and the relevant facts (formulas, axioms, logical rules) are generally in the training distribution of any large model. The ceiling is not knowledge retrieval but the quality of the inferential chain — whether each step correctly follows from the previous one. This makes models with strong reasoning training (R1-distilled, o1-style) well-suited to mathematical domains in ways they are not for medical ones.
Verifier-guided search + RL for medical reasoning (HuatuoGPT-o1): Medical domain's narrower scope enables automated verification that general domains lack. HuatuoGPT-o1 constructs verifiable medical problems, then uses verifier feedback (True/False) to guide trajectory search: the model initializes a CoT, and if the verifier rejects it, extends the chain by sampling strategies (backtracking, new paths, verification, correction). Successful trajectories are used for SFT, then RL with PPO refines further. Only 40K verifiable problems are needed to outperform both general and medical-specific baselines. The knowledge-dominant nature of medicine means verifier-guided search is especially valuable — it catches factual errors that pure reasoning training cannot.
The broader point: "domain AI" is not a monolithic problem. The right metric, the right training approach, and the right architecture depend on whether the domain is more knowledge-sensitive or more reasoning-sensitive. A single evaluation framework (accuracy benchmarks) hides this distinction by collapsing the two into one number.
This connects to When does explicit reasoning actually help model performance? — that task-type specificity claim applies at the domain level: math and logic are the paradigmatic derivation domains, medical reasoning is closer to the continuous judgment end.
Inquiring lines that use this note as a source 5
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Why does general reasoning not transfer to knowledge-intensive medical domains?
- Why do medical and mathematical tasks require fundamentally different model capabilities?
- Why do medical diagnoses require human judgment even with AI assistance?
- What makes reasoning auditable in medical AI decision support?
- Why does contextual judgment matter more in law and medicine than in mathematics?
Related concepts in this collection 5
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Does RL improve domain reasoning by adding knowledge or removing it?
When reinforcement learning improves reasoning in specialized domains like medicine, is it teaching models new facts or preventing them from using wrong ones? Understanding this distinction matters for how we design RL training.
RL pruning is the right tool for knowledge-dominant domains
-
Why doesn't mathematical reasoning transfer to medicine?
Can models trained to reason well about math apply those skills to medical domains through fine-tuning? This explores whether reasoning ability is truly domain-agnostic or constrained by domain-specific knowledge requirements.
transfer failure is specifically a knowledge-dominant domain problem
-
Does supervised fine-tuning actually improve reasoning quality?
While SFT boosts final-answer accuracy, does it degrade the quality and informativeness of the reasoning steps that justify those answers? This matters for high-stakes domains requiring auditable decision-making.
SFT's cost (reasoning quality) is more tolerable in knowledge-dominant domains; more damaging in reasoning-dominant ones
-
When does explicit reasoning actually help model performance?
Explicit reasoning improves some tasks but hurts others. What determines whether step-by-step reasoning chains are beneficial or harmful for a given problem?
task-type specificity at a finer level than domain
-
Why do language models fail confidently in specialized domains?
LLMs perform poorly on clinical and biomedical inference tasks while remaining overconfident in their wrong answers. Do standard benchmarks hide this fragility, and can prompting techniques fix it?
confirms knowledge-dominance from the NLI perspective: clinical/biomedical domains have high knowledge requirements and correspondingly high overconfidence when knowledge is absent
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Knowledge or Reasoning? A Close Look at How LLMs Think Across Domains
- Bottom-up Domain-specific Superintelligence: A Reliable Knowledge Graph is What We Need
- HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
- Medical Reasoning in the Era of LLMs: A Systematic Review of Enhancement Techniques and Applications
- The Incomplete Bridge: How AI Research (Mis)Engages with Psychology
- Understanding, explaining, and utilizing medical artificial intelligence
- On the Impact of Fine-Tuning on Chain-of-Thought Reasoning
- LLMs can implicitly learn from mistakes in-context
Original note title
domain competency requirements differ by domain — medical is knowledge-dominant while math is reasoning-dominant