How do the six memory components combine across explicit and implicit paths?
This explores how the different kinds of memory an LLM system uses fit together — and the corpus doesn't offer a single tidy 'six-component' scheme, so I'm reading it as: what memory types show up across these papers, and how do they split into things a model looks up (explicit) versus things baked into its weights and activations (implicit)?
This explores how the different kinds of memory in LLM systems combine, and the honest first thing to say is that no single note here hands you a canonical six-part taxonomy. What the corpus does give you is more interesting: memory is quietly fragmenting into distinct types, and those types fall along two paths. The explicit path is memory you retrieve — you store something, then look it up. The implicit path is memory that lives inside the network's weights or running activations, never retrieved as a discrete item, just expressed in behavior. The question's real payoff is seeing that the best systems don't pick one.
On the explicit side, Can agents compress their own memory without losing critical details? is the closest thing to a component list: it consolidates an agent's history into separate episodic, working, and tool-memory schemas — three named stores you reflect over and read back. Does state-indexed memory outperform high-level workflow memory for web agents? adds procedural memory, but with a sharp twist — indexing 'how to do this' by the exact environment state and click pairs beats storing tidy high-level workflows, because abstraction throws away the specifics you actually need. Can lookup memory and computation work together better than either alone? adds a fourth flavor: an O(1) N-gram lookup table sitting beside the model, pure retrieval. And Can cognition work by reusing memory instead of recomputing? reframes all of this — intelligence as navigating a topological memory of past inference paths rather than recomputing, which makes 'reuse what you've stored' the whole engine of thought.
The implicit path is where it gets surprising. Is long-context bottleneck really about memory or compute? argues the real limit on long context isn't storage at all — it's the compute to consolidate evicted context into fast weights during an offline 'sleep' phase. That's memory you can't look up; it's been dissolved into the model's parameters. Meanwhile the KV cache acts as a transient working memory: Can recursive subtask trees overcome context window limits? shows you can prune 90% of it and still reason if the structure is right, and Can multiple LLMs coordinate without explicit collaboration rules? shows several models sharing one cache start coordinating without being told to — memory as a substrate behavior emerges from, not a thing anyone retrieves.
The combination is the actual finding. Can lookup memory and computation work together better than either alone? reports a U-shaped scaling law: a hybrid of explicit lookup plus implicit computation beats either alone at equal cost, with the biggest gains in reasoning and code rather than raw retrieval. That's the lesson hiding inside your question — these aren't competing designs to choose between, they're complementary axes. Lookup gives you cheap, exact recall; weight-consolidation gives you generalization and skill. Systems get strong by routing across both, not by maximizing one.
So if you came looking for six neat boxes, the more useful thing to walk away with is the two-path map: episodic, working, tool, procedural, and N-gram memory on the explicit/retrieval side; fast-weight consolidation and the live KV cache on the implicit side — and the quiet consensus across these papers that the wins live in the combination, not the components.
Sources 7 notes
DeepAgent's autonomous memory folding consolidates interaction history into episodic, working, and tool memory schemas. This reduces token overhead while letting agents pause to reconsider strategies—the autonomy and structure together avoid degradation that plagues poorly designed consolidation.
PRAXIS shows that indexing procedures by environment state and local action pairs yields consistent accuracy and reliability gains across VLM backbones on the REAL benchmark, compared to higher-level workflow abstractions that lose click-by-click specifics.
Engram combines O(1) N-gram lookup with Mixture-of-Experts routing, revealing a U-shaped scaling law where balanced allocation to both mechanisms outperforms either alone. Gains appear largest in reasoning and code rather than pure retrieval.
Memory-Amortized Inference proposes intelligence arises from structured reuse of prior inference paths over topological memory, inverting RL's reward-forward logic into cause-backward reconstruction. This duality explains energy efficiency and suggests memory trajectories form the substrate of adaptive thought.
Research shows the bottleneck is not memory capacity but the compute required to consolidate evicted context into fast weights during offline sleep phases. Performance improves with more consolidation passes, following a test-time scaling pattern on harder reasoning tasks.
The Thread Inference Model demonstrates that reasoning structured as recursive subtask trees with rule-based KV cache pruning sustains accurate reasoning beyond context limits, even when manipulating 90% of the cache. This enables single models to replace multi-agent systems by handling full recursive reasoning internally.
Existing reasoning-capable models like QwQ and DeepSeek-R1 spontaneously formulate plans, detect redundancy, and adapt strategies when given shared access to a concurrent KV cache. This coordination emerges without fine-tuning, suggesting reasoning models already possess multi-agent collaboration capabilities.