What discarding policy prevents both stale entries and loss of rare critical knowledge?
This explores the eviction problem in agent memory and context systems — how to decide what to throw away so you don't keep dead weight, yet never delete the rare item that turns out to matter — and the corpus suggests the answer is less a 'policy' than an architecture choice about who decides and how.
This explores the eviction problem in agent memory: any system that keeps adding knowledge eventually has to decide what to drop, and a naive policy (oldest-out, smallest-out, least-used-out) fails twice over — it both lets stale entries linger and silently discards the rare critical fact that's used once a year but is decisive when needed. The corpus doesn't hand you a single discard rule. Instead, the strongest recurring move is to stop framing it as deletion at all and reframe it as curation — a deliberate, structured rewrite of what's kept.
The clearest version of this is the 'evolving playbook' idea, where context is updated through incremental generation-reflection-curation loops rather than periodic full rewrites Can context playbooks prevent knowledge loss during iteration?. The insight there is that the real enemy isn't age, it's *compression*: when you summarize or rewrite wholesale, you get 'brevity bias' and 'context collapse,' which is exactly how rare-but-critical detail gets erased — not because something judged it stale, but because nothing protected it during a rewrite. Incremental edits keep the long tail alive while still letting the playbook grow.
The deeper answer is that *who* does the discarding matters more than the rule. One line of work decouples a trainable curator from a frozen executor, and the curator learns to shift a skill repository away from generic verbose additions toward durable cross-task meta-strategies Can a separate trained curator improve skill libraries better than frozen agents?. That's a learned eviction policy: instead of a fixed heuristic, a model learns which entries are strategically load-bearing. Compositional skill libraries push the same idea from the other side — they avoid the catastrophic forgetting of weight-update methods by storing skills externally and *composing* complex ones from simpler ones, so a rare primitive is retained precisely because higher-level skills depend on it Can agents learn new skills without forgetting old ones?.
A second school sidesteps storage policy entirely by reconstructing knowledge on demand. If memory is a graph you traverse at query time — pruning paths based on accumulated evidence rather than pre-deciding what to keep — then 'staleness' and 'rarity' stop being storage decisions and become retrieval decisions made fresh each time Can agents reconstruct memory on demand instead of retrieving it?. Relatedly, a stateful memory workspace that detects and *resolves contradictions* across retrieval cycles gives you a principled way to retire genuinely outdated entries: an entry is stale when newer evidence contradicts it, not when it's simply old Can reasoning systems maintain memory across retrieval cycles?. Contradiction-resolution is arguably the cleanest 'discard the stale' signal in the whole corpus.
The surprising counterpoint is that aggressive forgetting can be a feature, not a bug — but only in the right place. Markov-style memoryless reasoning deliberately discards accumulated history so each step depends only on the current contracted problem, eliminating the historical baggage that bloats reasoning Can reasoning systems forget history without losing coherence?. The lesson when you put these together: discard ephemeral *working state* ruthlessly, but treat durable *knowledge* as something to curate and reconstruct, never to age out. The system that avoids both failure modes isn't running a smarter LRU cache — it's separating transient context from learned knowledge and letting evidence (contradiction, dependency, strategic value), not a timestamp, decide what survives.
Sources 6 notes
The ACE framework treats contexts as evolving playbooks using generation-reflection-curation loops rather than full rewrites. This prevents knowledge loss from compression and detail erosion, achieving +10.6% on agentic tasks and +8.6% on finance without labeled supervision.
SkillOS shows that separating a trainable curator from a frozen executor, grouped by task streams, causes skill repositories to shift from generic verbose additions toward actionable execution logic and cross-task meta-strategies. The trained curator generalizes across different executor backbones and domains.
VOYAGER demonstrates that storing executable skills in an embedding-indexed library and composing complex skills from simpler ones allows agents to learn continuously while avoiding the forgetting that occurs with weight-update-based methods. Environmental feedback refines skills while an automatic curriculum drives continual exploration.
MRAgent achieves up to 23% gains on reasoning tasks by reconstructing memory through active graph traversal that prunes paths based on accumulated evidence, while reducing token and runtime cost compared to fixed-retrieval pipelines.
ComoRAG demonstrates that iterative evidence acquisition with a persistent memory workspace outperforms stateless multi-step retrieval by detecting and resolving contradictions through deeper exploration, achieving up to 11% gains on complex queries.
Atom of Thoughts decomposes problems into DAGs and contracts them iteratively, ensuring each state depends only on the current problem—not prior steps. This memoryless approach eliminates historical baggage that bloats reasoning while maintaining answer equivalence.