Is agent memory capacity or quality the real bottleneck?
While more storage seems like the obvious solution to memory problems, what if the real constraint is actually curation—deciding what to keep, discard, and retrieve without degrading performance?
The intuitive picture of agent memory is a storage problem: give the agent more room and it remembers more. The system-scaling analysis rejects this. "Memory is not merely a storage layer; the harder problem is memory quality" — what to store, what to discard, how to retrieve the right information at the right time, and how to avoid staleness, drift, contamination, and over-generalization. Adding capacity without curation makes things worse: more stored material means more stale entries to retrieve, more opportunities for contaminated content to surface, and more room for over-generalized lessons to misfire on cases they do not fit.
This reframes memory engineering as a discarding and curation problem, not an accumulation one. The failure modes are specific and diagnosable — staleness (kept too long), drift (slowly diverging from ground truth), contamination (bad entries poisoning retrieval), over-generalization (a narrow lesson applied too broadly) — and each calls for different hygiene. The open question, which the paper leaves unresolved, is what discarding policy avoids all four without throwing away genuinely useful long-tail knowledge. The counterpoint is that aggressive forgetting risks losing rare-but-critical information, so quality is a trade-off, not a free win. This matters because it redirects effort from bigger memory stores toward better forgetting — the part of memory design that is hardest and least solved.
Inquiring lines that use this note as a source 25
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Should agents update memory after every turn or batch process sessions?
- Why do different agent memory architectures make incompatible granularity claims?
- How much does agent performance depend on demonstration quantity versus curation quality?
- What architectural changes would accelerate the cleanup phase?
- How does bottleneck automation differ from accessory work displacement?
- What tree depth is achievable before GPU memory becomes the bottleneck?
- What makes provenance infrastructure more critical than artifact quality?
- Can topology repair fix consolidation failures in agent memory?
- How does procedural memory granularity affect web agent performance?
- Which memory components trigger context-length problems in agents?
- Can pruning policies alone solve working memory bloat in agents?
- What is the right granularity level for agent memory to enable both reuse and composition?
- When does memory consolidation help agents instead of hurting performance?
- Can agent-controlled memory management outperform fixed consolidation schedules?
- Why do continuously consolidated agent memories eventually degrade below no-memory baseline?
- How do planning and memory compress agentic system costs?
- What makes timestamped knowledge repositories better than static memory?
- When should architects prioritize consolidation compute over larger context windows?
- What specific failure modes emerge when agents retrieve stale or contaminated memories?
- What makes memory curation harder to solve than simply expanding storage?
- How does durable memory quality shape agent performance over time?
- Why does memory consolidation degrade agent performance below baseline?
- How do memory tools and planning each contribute to agent efficiency?
- What separates artifact recall from persistent memory commitment in agents?
- What makes persistent, shared code artifacts from agents hard to manage at scale?
Related concepts in this collection 2
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
How should agents decide what memories to keep?
Agent memory management splits between agents autonomously recognizing important information versus programmatic triggers. Understanding this choice reveals why different memory architectures prioritize different information types.
describes the management machinery through which memory-quality decisions get made
-
Does including all conversation history actually help retrieval?
Conversational search systems typically use all previous context to understand current queries. But do topic switches in multi-turn conversations inject noise that degrades performance rather than helps it?
concrete instance where more stored context degrades quality, supporting discard-over-accumulate
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- From Model Scaling to System Scaling: Scaling the Harness in Agentic AI
- Useful Memories Become Faulty When Continuously Updated by LLMs
- OMNI-SIMPLEMEM: Autoresearch-Guided Discovery of Lifelong Multimodal Agent Memory
- Memory in the Age of AI Agents: A Survey — Forms, Functions and Dynamics
- Why Do Multi-agent LLM Systems Fail?
- Toward Efficient Agents: A Survey of Memory, Tool Learning, and Planning
- ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory
- AI Agents Need Memory Control Over More Context
Original note title
the real memory problem is quality not storage — what to discard and how to avoid drift contamination and over-generalization