SYNTHESIS NOTE

Can small models learn to ground answers in context?

Does model size determine whether a system can cite evidence, refuse to answer, and reason over passages jointly? Or can training data alone teach these behaviors at any scale?

Synthesis note · 2026-06-27 · sourced from Reasoning Critiques

The default story of the last few years is that capability tracks scale: bigger weights absorb more world knowledge, and knowledge is what makes a model useful. OCC-RAG inverts the premise for one important task. For context-grounded QA, parametric knowledge is not an asset — it is the contamination source, because a model that answers from memory is a model that can confabulate when the supplied passages are thin. Therefore the design goal becomes the opposite of scale: produce a small model that reasons over the provided context and ignores what it memorized.

What matters is that the three properties usually treated as separate — multi-hop reasoning over passages, literal-quote citation, and calibrated abstention — were jointly trained into a 0.6B/1.7B model via a synthetic corpus of 3M+ examples, and the result beats stronger sub-4B baselines. This reframes faithfulness as a supervision-format problem rather than a capacity problem. The curriculum teaches the model what to do when evidence is insufficient (abstain) and how to tie each claim to a literal span, which is exactly the behavior that Can RAG systems refuse to answer without reliable evidence? identifies as the load-bearing RAG primitive.

The strongest counterargument is that small models simply have less to hallucinate from, so abstention is cheap for them — the result might not transfer to frontier models whose parametric pull is far stronger. But that is also the point: if faithfulness is a learnable format, the lever is the training data, not the parameter count, and the same curriculum could in principle be applied at any scale. The citation behavior also carries a risk worth flagging — since Do users trust citations more when there are simply more of them?, literal-quote citations can manufacture trust independent of whether the grounding is actually sound.

Inquiring lines that use this note as a source 2

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
13 direct connections · 131 in 2-hop network ·dense cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

faithfulness is a training curriculum not a scale property — small models can learn context-grounding, citation, and abstention jointly