INQUIRING LINE

How many document exposures does procedural knowledge versus factual information require?

This explores a finding about how models learn differently from their training data: factual recall leans on seeing the specific target fact (narrow, memorized), while reasoning skills get assembled from many documents that never state the answer.


This explores a finding about how models learn differently from their training data — and the short version is that the two kinds of knowledge live on opposite ends of a spectrum. When researchers traced 5 million pretraining documents to see what a model actually leaned on for a given output, factual retrieval turned out to be narrow and document-specific: to answer a fact reliably, the model needs to have memorized that exact fact from a small number of sources that state it directly. Reasoning was the opposite — it drew on a broad, diffuse spread of documents demonstrating *procedures* (how to work through a problem), none of which contained the target answer Does procedural knowledge drive reasoning more than factual retrieval?. So 'how many exposures' isn't one number: a fact wants a few direct hits, a procedure wants many indirect ones.

What makes this counterintuitive is that procedural knowledge generalizes *because* it's spread thin across many examples. The model isn't copying one worked solution; it's averaging over a style of working that recurs in different guises, which is why reasoning transfers to problems it never saw. Factual recall can't transfer the same way — if the fact wasn't in the training data, no amount of reasoning style conjures it.

The corpus has a striking sibling to this. A separate line of work shows models can reconstruct knowledge that was *never stated in any single document*, piecing it together from implicit hints scattered across the training set — inferring a censored city's identity from fragments of distance relationships, for instance Can LLMs reconstruct censored knowledge from scattered training hints?. That's the procedural-style, distribution-wide learning at work on facts: when direct exposure is denied, the model falls back on connecting many weak signals. It's the same mechanism the reasoning paper describes, just pointed at a fact instead of a method.

There's a practical seam here too. If factual knowledge is the brittle, exposure-hungry part, then knowing *when to look it up* rather than recall it becomes the real skill. One approach frames each reasoning step as a decision about whether to retrieve external knowledge or trust the model's internal store, and gains accuracy precisely by switching to retrieval for the narrow facts while leaning on parametric procedure for the reasoning When should language models retrieve external knowledge versus use internal knowledge?. The division of labor mirrors the pretraining finding: retrieve the facts, internalize the procedures.

The thing worth carrying away: the question of 'how many exposures' quietly reframes what we think a language model is doing. It isn't one memory system with a single learning curve — it's two. One bank fills up by direct repetition and stays local; the other distills a way of operating from a whole distribution and travels.


Sources 3 notes

Does procedural knowledge drive reasoning more than factual retrieval?

Analysis of 5 million pretraining documents shows reasoning relies on broad, transferable procedural knowledge from diverse sources, unlike factual recall which depends on narrow, document-specific memorization of target facts.

Can LLMs reconstruct censored knowledge from scattered training hints?

Language models perform out-of-context reasoning across the full training distribution, reconstructing information never explicitly stated in any single document. Experiments show models can infer city identities from scattered distance relationships and apply them downstream without in-context learning.

When should language models retrieve external knowledge versus use internal knowledge?

DeepRAG models each reasoning step as a Markov Decision Process where the model learns when to retrieve versus rely on parametric knowledge. The 21.99% improvement comes from better-targeted retrieval and elimination of noise from unnecessary external knowledge.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a researcher auditing whether the exposure-count divide between procedural and factual knowledge still holds. A curated library (2024–2026) surfaced this claim: is it dated, refined, or overturned?

What a curated library found — and when (dated claims, not current truth):
Findings span 2024–2026. The library emphasized:
- Factual retrieval is narrow and document-specific: a model needs direct, repeated exposures to memorize individual facts reliably (2024-11, arXiv:2411.12580).
- Procedural knowledge generalizes from *diffuse* exposure across many documents demonstrating problem-solving style, not copying one solution (2024-11).
- Models reconstruct never-stated knowledge by piecing implicit hints across the training set — a procedural-style mechanism applied to facts (2024-06, arXiv:2406.14546).
- Retrieval-augmented reasoning improves accuracy by treating each step as a choice: retrieve for narrow facts, rely on parametric procedure for reasoning (2025-02, arXiv:2502.01142).
- Recent work (2025–2026) explores agentic context engineering, continuous latent reasoning, and vulnerability of delegation (arXiv:2510.04618, arXiv:2511.18659, arXiv:2604.15597).

Anchor papers (verify; mind their dates):
- arXiv:2411.12580 (2024-11): Procedural Knowledge in Pretraining Drives Reasoning
- arXiv:2406.14546 (2024-06): Connecting the Dots — latent structure inference
- arXiv:2502.01142 (2025-02): DeepRAG — per-step retrieval decisions
- arXiv:2511.18659 (2025-11): CLaRa — continuous latent reasoning

Your task:
(1) RE-TEST THE DIVIDE. For each claim above, ask: have newer model scales, in-context learning, synthetic training, or architectural innovations (e.g., retrieval-in-weights, mixture-of-experts, continuous latent codes) since *blurred or collapsed* the fact/procedure boundary? Judge whether the durable question — "do different knowledge types require different training regimes?" — still stands, or whether newer methods now handle both uniformly. Cite what moved the regime, and flag where the divide still appears robust.
(2) Surface the strongest *contradicting* or *superseding* work from the last ~6 months. Does any recent paper argue that the procedural/factual split is an artifact of older pretraining paradigms, not a fundamental property?
(3) Propose 2 research questions that assume the regime may have shifted: e.g., "If in-context examples now enable fine-grained fact learning without document memorization, does the exposure-count model collapse?" or "Can procedural knowledge be taught *directly* via fewer exposures if scaffolded with synthetic exemplars?"

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines