OCC-RAG: Optimal Cognitive Core for Faithful Question Answering
Recent progress in the development of language models has been defined by scale, with each generation absorbing more of the world’s knowledge into its weights. However, many practical applications benefit more from robust reasoning than from extensive parametric knowledge. In this setting, taskspecialized small language models (SLMs) offer a principled design choice. We introduce Optimal Cognitive Core (OCC), a family of SLMs built around this premise. As a variant of OCC, we present OCC-RAG, optimized for faithful question answering (QA) grounded in the provided context. This task directly aligns with the OCC design approach, requiring multi-hop reasoning over supplied passages while ignoring memorized knowledge. To train OCC-RAG, we implement a novel pipeline for synthesizing multicontext, multi-hop QA data at scale, producing a corpus of over three million examples targeting multi-hop reasoning, strict context faithfulness, and calibrated abstention. We release OCC-RAG-0.6B and OCC-RAG-1.7B, both mid-trained on this corpus. The models produce structured reasoning traces with source citations grounded in literal quotes from the context.
Introduction. Frontier language models grow larger and absorb ever more of the world’s knowledge, yet many practical applications benefit more from compact, task-specialized architectures (Belcak et al., 2025). Small Language Models (SLMs) have demonstrated competitive or superior performance across commonsense reasoning (Cao et al., 2026), mathematical reasoning (Liu et al., 2023), tool calling (Zhang et al., 2025), and retrieval-augmented generation (Schick & Schütze, 2021). Furthermore, fine-tuning SLMs on targeted datasets enables cost-effective adaptation to specialized use cases, an advantage especially pronounced when computational resources are limited (Gururangan et al., 2020). One such task is Context Question Answering (Context QA), where models answer questions based exclusively on a provided context, generating responses grounded in or reasoning from that input (Radevski et al., 2025; Aushev et al., 2025). A central requirement for such systems is faithfulness: producing outputs strictly derived from the given context while disregarding parametric knowledge.
Discussion / Conclusion. We presented OCC-RAG, a family of small language models designed for faithful contextgrounded question answering. By combining large-scale synthetic mid-training, explicit reasoning traces, and citation-aware output formatting, OCC-RAG learns to answer only from the provided context and to abstain when evidence is insufficient. Across multi-hop reasoning, faithfulness, and refusal benchmarks, the released 0.6B and 1.7B checkpoints consistently outperform stronger baselines under 4B parameters and remain competitive with much larger models, while using substantially less compute. A key takeaway from this work is that faithfulness does not require scale alone: it can be learned through the right training curriculum and supervision format. In particular, our synthetic corpus shows that multi-hop reasoning, context grounding, and calibrated abstention can be jointly trained in small models without sacrificing efficiency.