INQUIRING LINE

How can memory shift from a passive datastore to an actively trained component?

This explores what it means to make memory 'active' — moving from a store you only read from to a component that learns — and the corpus reveals this splits into two very different bets: training a memory module with its own weights, versus making the memory's contents the thing that learns so you never touch weights at all.


This explores what it means for memory to stop being a passive lookup table and start behaving like something that learns. The corpus answers this in two camps that don't share vocabulary but are arguing about the same thing — and noticing the split is the real payoff.

The first camp makes the memory itself a trainable module. Titans adds a neural long-term memory that learns *what to store* by prioritizing surprising tokens, so the act of remembering is a learned policy rather than a fixed cache — and that's what lets it scale past two million tokens without attention's quadratic cost Can neural memory modules scale language models beyond attention limits?. A related reframing says the long-context bottleneck was never about storage capacity at all: it's the *compute* needed to consolidate evicted context into fast weights during offline 'sleep' passes, and performance climbs the more consolidation you do Is long-context bottleneck really about memory or compute?. In both, memory is active because writing to it is an expensive, learned transformation — not a copy. Engram pushes the architectural version: pairing O(1) lookup memory with learned Mixture-of-Experts routing beats pure computation at equal parameters, suggesting memory and computation are complementary axes you can train against each other rather than one substituting for the other Can lookup memory and computation work together better than either alone?.

The second camp is more surprising: it makes memory active *without ever touching weights.* AgentFly reframes the whole agent as a Memory-augmented decision process where credit assignment and policy improvement happen entirely through memory operations — the agent gets measurably better (87.88% on GAIA) while the underlying model stays frozen Can agents learn continuously from experience without updating weights?. Reflexion does the lightweight version: it stores verbal self-diagnoses of failures as episodic memory, so trial-and-error learning lives in retrievable text rather than gradients — and crucially, it keeps those reflections *uncompressed* because compression would destroy their usefulness Can agents learn from failure without updating their weights?. VOYAGER generalizes this into a skill library: executable skills get indexed, composed into bigger skills, and refined by environmental feedback, giving lifelong learning while sidestepping the catastrophic forgetting that weight updates cause Can agents learn new skills without forgetting old ones?.

So the two camps are really making opposite bets on the same question. The neural-memory people say: make the memory differentiable and train it. The agent-memory people say: if your memory is rich and well-organized enough, *writing to it is the learning* — gradient descent becomes optional. The thing readers may not expect is that the second camp treats frozen weights as a feature, not a limitation: it's precisely how they dodge forgetting.

The most provocative note suggests this distinction may be softer than it looks. A formal result shows RL agents *spontaneously* turn their spatial environment into external memory — standard reward optimization alone makes path-following agents develop memory-like behavior, with no memory objective written anywhere Do RL agents accidentally use environments as memory?. If memory can emerge as a side effect of training for something else entirely, then 'actively trained memory' isn't always a component you design — sometimes it's a structure that training discovers on its own.


Sources 7 notes

Can neural memory modules scale language models beyond attention limits?

Titans architecture separates attention (short-term, quadratic) from neural memory (long-term, compressed), prioritizing surprising tokens for storage. The model outperforms standard Transformers and linear RNNs across tasks while scaling to 2M+ token contexts without quadratic penalties.

Is long-context bottleneck really about memory or compute?

Research shows the bottleneck is not memory capacity but the compute required to consolidate evicted context into fast weights during offline sleep phases. Performance improves with more consolidation passes, following a test-time scaling pattern on harder reasoning tasks.

Can lookup memory and computation work together better than either alone?

Engram combines O(1) N-gram lookup with Mixture-of-Experts routing, revealing a U-shaped scaling law where balanced allocation to both mechanisms outperforms either alone. Gains appear largest in reasoning and code rather than pure retrieval.

Can agents learn continuously from experience without updating weights?

AgentFly formalizes agent learning as a Memory-augmented MDP with three memory modules (case, subtask, tool) that enable credit assignment and policy improvement entirely through memory operations. The approach achieved 87.88% on GAIA validation without modifying LLM parameters.

Can agents learn from failure without updating their weights?

Reflexion demonstrates that unambiguous environmental feedback (success/failure) enables agents to write useful self-diagnoses and improve across episodes without parameter updates. The binary signal prevents rationalization, and keeping reflections uncompressed preserves their usability.

Can agents learn new skills without forgetting old ones?

VOYAGER demonstrates that storing executable skills in an embedding-indexed library and composing complex skills from simpler ones allows agents to learn continuously while avoiding the forgetting that occurs with weight-update-based methods. Environmental feedback refines skills while an automatic curriculum drives continual exploration.

Do RL agents accidentally use environments as memory?

Mathematical proof shows that environmental artifacts reduce information needed to represent history in RL agents. Path-following agents naturally develop memory-like behavior through standard reward optimization, satisfying situated cognition criteria without explicit memory objectives.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-testing claims about whether memory can shift from passive storage to actively trained component. Treat the following as dated findings (spanning 2023–2026), not current truth, and probe what has changed.

What a curated library found — and when (dated claims, not current truth):
• Neural long-term memory can learn *what to store* by prioritizing surprising tokens, scaling past 2M tokens without quadratic attention cost (Titans, ~2024–2025).
• The long-context bottleneck is compute to consolidate evicted context into fast weights during offline passes, not storage capacity; performance climbs with more consolidation (~2024).
• Frozen-weight agents can improve measurably (87.88% GAIA) by credit assignment and policy refinement entirely through memory operations, treating uncompressed episodic memory as the learning substrate rather than gradients (AgentFly, Reflexion, ~2024).
• Skill libraries can compound through synthesis and environmental feedback, enabling lifelong learning while avoiding catastrophic forgetting from weight updates (VOYAGER, ~2024).
• RL agents spontaneously turn spatial environments into external memory as a side effect of reward optimization, with no explicit memory objective (~2024).
• Memory and computation are complementary sparsity axes trainable against each other via learned routing, not substitutes (~2025–2026).

Anchor papers (verify; mind their dates):
• Titans: Learning to Memorize at Test Time (2501.00663, Dec 2024).
• Conditional Memory via Scalable Lookup (2601.07372, Jan 2026).
• Artifacts as Memory Beyond the Agent Boundary (2604.08756, Apr 2026).
• Useful Memories Become Faulty When Continuously Updated (2605.12978, May 2026).

Your task:
(1) RE-TEST EACH CONSTRAINT. For each claim above, judge whether newer models, training methods, tooling, orchestration (multi-agent memory alignment, cross-agent caching), or evaluation have since relaxed or overturned it. Separate the durable question (can memory shift to an active, trainable substrate?) from perishable limitations (e.g., does uncompressed episodic memory still degrade under continuous update?). The May 2026 paper title suggests drift in stored memories — does this overturn the frozen-weight thesis, or refine it?
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months. Pay special attention to reconciling the two camps: do recent papers show that gradient-based memory training and frozen-weight learning are actually the same phenomenon under different regimes, or genuinely orthogonal?
(3) Propose 2 research questions that ASSUME the regime may have shifted: e.g., *if* memory update drift is now a known problem, how do agents maintain utility across continual writes? *If* emergent memory is ubiquitous in RL, can we design interventions to make that emergence faster or more controllable?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines