MeMo: Memory as a Model

Paper · arXiv 2605.15156 · Published May 14, 2026

Large language models (LLMs) achieve strong performance across a wide range of tasks, but remain frozen after pretraining until subsequent updates. Many realworld applications require timely, domain-specific information, motivating the need for efficient mechanisms to incorporate new knowledge. In this paper, we introduce MEMO (Memory as a Model), a modular framework that encodes new knowledge into a dedicated MEMORY model while keeping the LLM parameters unchanged. Compared to existing methods, MEMO offers several advantages: (a) it captures complex cross-document relationships, (b) it is robust to retrieval noise, (c) it avoids catastrophic forgetting in the LLM, (d) it does not require access to the LLM’s weights or output logits, enabling plug-and-play integration with both open and proprietary closed-source LLMs, and (e) its retrieval cost is independent of corpus size at inference time. Our experimental results on three benchmarks, BrowseComp-Plus, NarrativeQA, and MuSiQue, show that MEMO achieves strong performance compared to existing methods across diverse settings.

Introduction. Large language models (LLMs) have demonstrated remarkable capabilities across diverse tasks [1–3]. Despite their successes, these models are effectively frozen for extended periods after pretraining [4] until subsequent updates, causing their pretrained knowledge to become increasingly outdated as the world evolves. For applications that require up-to-date [5, 6] or domain-specific [7, 8] knowledge, this dependence on static knowledge presents a fundamental architectural limitation [9, 10]. Retraining is a natural solution but remains prohibitively expensive at modern scales [11], motivating the need for an efficient mechanism to integrate new external knowledge into LLMs without full retraining. Existing methods for integrating new knowledge into LLMs fall into three categories. 1⃝Nonparametric methods retrieve relevant information from an external store at inference time via lexical [12], dense [13], or graph-based retrievers [14–17], before incorporating it through in-context learning [18, 19].

Discussion / Conclusion. We introduced MEMO, a modular framework for integrating updated or domain-specific knowledge into LLMs via a MEMORY model trained on a synthesized reflection QA dataset. MEMO addresses key limitations of existing methods: it bypasses context constraints and limited cross-document reasoning in retrieval-based approaches, avoids costly and brittle parametric updates (including catastrophic forgetting), and removes representation coupling in latent memory methods. Its core components are a data synthesis pipeline capturing explicit facts and implicit relationships, and a multi-turn inference protocol that decomposes complex queries into targeted sub-queries for desired information retrieval from the memory model. While MEMO demonstrates strong performance, it has limitations regarding training cost, evaluation scope, and the capacity of MEMORY model to scale with corpus size (see App. B). Empirically, MEMO outperforms strong baselines across diverse benchmarks. It also provides a scalable pathway for knowledge integration, supporting efficient updates and plug-and-play deployment with both open and proprietary closed-source LLMs.

MeMo: Memory as a Model

Synthesis notes that discuss concepts related to this paper