INQUIRING LINE

Can episodic memory alone enable learning without parameter updates?

This explores whether an agent can genuinely learn — improve at tasks over time — using only an external memory store, while its underlying model weights stay frozen.


This explores whether an agent can genuinely learn — improve at tasks over time — using only an external memory store, while its underlying model weights stay frozen. The corpus answer is a fairly emphatic yes: several systems show real, measurable improvement with zero parameter updates, and the more interesting finding is that *how* the memory is shaped matters more than the fact of having one.

The clearest existence proofs come from agent systems that route all learning through memory operations. AgentFly reframes learning itself as a problem of credit assignment over memory rather than over weights, and reaches 87.88% on the GAIA benchmark without touching the model Can agents learn continuously from experience without updating weights?. Reflexion shows the simplest version of the loop — an agent that fails, writes itself a verbal note about why, and reads that note next time — and finds that an unambiguous success/failure signal is what keeps those reflections honest rather than rationalized Can agents learn from failure without updating their weights?. So far, episodic memory alone clearly *can* drive learning.

But the corpus immediately complicates 'alone.' The shape of the stored experience turns out to be decisive. Storing memories as causal abstractions — recording not just what happened but the conditions under which an action applies — beats generic reflection by 23 points and, crucially, transfers 4–17 points to environments the agent never trained on Can frozen language models continually improve through memory structure alone?. SkillRL pushes the same idea: successes and failures shouldn't be stored the same way — keep wins as concrete demonstrations, distill losses into abstract lessons Should successful and failed episodes be processed differently?. And VOYAGER shows memory can hold executable skills that compound into harder skills over a lifetime, sidestepping the catastrophic forgetting that weight updates cause Can agents learn new skills without forgetting old ones?. The lesson across these: 'episodic memory' is doing a lot of quiet work — raw logs of past episodes don't learn; structured, abstracted, differentially-processed memory does.

Here's the thing you might not have known to ask: this reframes forgetting itself as an *allocation* problem rather than an unavoidable cost. Fast-Slow Training splits adaptation into slow weights and fast textual context and shows you get equivalent performance faster with far less forgetting, arguing forgetting is misallocation, not destiny Can splitting adaptation into two channels reduce forgetting?. That dovetails with evidence that even when you *do* update weights, RL only touches 5–30% of parameters in a structured, near-identical subnetwork Does reinforcement learning update only a small fraction of parameters? — hinting that much of what looks like 'learning' may be a small, externalizable adjustment that memory can stand in for.

Two deeper notes round out the territory. First, memory-as-learning can emerge without anyone designing it: RL agents provably offload information into their spatial environment, using the world as external memory just by optimizing reward Do RL agents accidentally use environments as memory?. Second, what counts as a useful memory unit matters — in-context learning of sequential decisions needs whole trajectories from the same setting, not isolated examples, to generalize without weight updates Why do trajectories matter more than individual examples for in-context learning?. Together the corpus suggests episodic memory alone *is* enough to learn — but only once you stop treating memory as a transcript and start treating it as a structured, curated artifact.


Sources 9 notes

Can agents learn continuously from experience without updating weights?

AgentFly formalizes agent learning as a Memory-augmented MDP with three memory modules (case, subtask, tool) that enable credit assignment and policy improvement entirely through memory operations. The approach achieved 87.88% on GAIA validation without modifying LLM parameters.

Can agents learn from failure without updating their weights?

Reflexion demonstrates that unambiguous environmental feedback (success/failure) enables agents to write useful self-diagnoses and improve across episodes without parameter updates. The binary signal prevents rationalization, and keeping reflections uncompressed preserves their usability.

Can frozen language models continually improve through memory structure alone?

Agents using causal-form memory (preserving applicability conditions) outperform generic reflection by 23 points on repeated trials and gain 4-17 points transferring to new environments, showing memory shape matters more than parameter updates.

Should successful and failed episodes be processed differently?

SkillRL demonstrates that treating successful episodes as concrete demonstrations and failures as abstracted lessons achieves state-of-the-art performance on complex tasks while using substantially less context than uniform approaches. The asymmetry mirrors human expert reasoning and avoids the degradation seen in uniform consolidation methods.

Can agents learn new skills without forgetting old ones?

VOYAGER demonstrates that storing executable skills in an embedding-indexed library and composing complex skills from simpler ones allows agents to learn continuously while avoiding the forgetting that occurs with weight-update-based methods. Environmental feedback refines skills while an automatic curriculum drives continual exploration.

Can splitting adaptation into two channels reduce forgetting?

Fast-Slow Training routes task-specific lessons into optimized prompts while keeping parameter updates minimal, reaching equivalent performance 1.4–3x faster with substantially less catastrophic forgetting and plasticity loss, demonstrating that forgetting is a misallocation problem rather than an inherent cost.

Does reinforcement learning update only a small fraction of parameters?

Across seven RL algorithms and ten LLM families, RL induces intrinsic parameter sparsity of 5–30% without explicit regularization. Critically, these sparse updates are nearly full-rank and nearly identical across random seeds, indicating structural rather than arbitrary parameter selection.

Do RL agents accidentally use environments as memory?

Mathematical proof shows that environmental artifacts reduce information needed to represent history in RL agents. Path-following agents naturally develop memory-like behavior through standard reward optimization, satisfying situated cognition criteria without explicit memory objectives.

Why do trajectories matter more than individual examples for in-context learning?

In-context learning for sequential decision-making requires full or partial trajectories from the same environment level, not just isolated examples. This structural property—trajectory burstiness—allows models to generalize across vastly different tasks without weight updates.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst. The question remains: *Can episodic memory alone enable learning without parameter updates?* A curated library (spanning 2023–2026) claims yes—but those claims are dated. Your job is to judge what still holds.

What a curated library found — and when (dated claims, not current truth):
• Frozen-weight agents reach 87.88% on GAIA via memory-routed credit assignment; verbal reflection + success/failure signals drive honest learning without weight touch (~2024).
• Memory *structure* matters more than memory *fact*: causal abstractions beat generic reflection by 23 points and transfer 4–17 points to unseen environments; differential storage (concrete wins vs. abstract losses) outperforms uniform logging (~2024–2025).
• Executable skill memory compounds without catastrophic forgetting; RL touches only 5–30% of parameters in sparse subnetworks, suggesting much "learning" is actually small, externalizable adjustment (~2025).
• In-context learning of sequential decisions requires whole trajectories from the same setting, not isolated examples (~2023–2024).
• Agents unintentionally offload information into spatial environments as external memory (~2024).

Anchor papers (verify; mind their dates):
• arXiv:2501.00663 — Titans: Learning to Memorize at Test Time (2024-12)
• arXiv:2505.11711 — Reinforcement Learning Finetunes Small Subnetworks in Large Language Models (2025-05)
• arXiv:2604.08756 — Artifacts as Memory Beyond the Agent Boundary (2026-04)
• arXiv:2605.12484 — Learning, Fast and Slow: Towards LLMs That Adapt Continually (2026-05)

Your task:
(1) RE-TEST EACH CONSTRAINT. For each claim above, determine whether newer model capability, in-context window scaling, test-time compute (chain-of-thought variants, verifiable reasoning, cognitive tools), or memory systems (retrieval-augmented generation, vector databases, hybrid slow–fast adaptation) have since relaxed or overturned it. Separate the durable question (does *unfrozen weight* learning remain necessary for some task classes?) from the perishable limitation (does memory *structure* still beat raw episodic logs?). Where a constraint still holds, state it plainly.
(2) Surface the strongest work from the last ~6 months that *contradicts* the "memory alone suffices" narrative—especially any finding that parameter updates remain irreplaceable or that memory continuously updated by LLMs becomes faulty (hint: arXiv:2605.12978).
(3) Propose 2 research questions that *assume* the regime has moved: (a) Under what conditions do memory-only systems fail to learn, and what hybrid (memory + sparse weight) architectures recover? (b) Can a frozen LLM + unbounded episodic memory match a continually-finetuned model in long-horizon RL, and if so, what is the memory-scaling law?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines