INQUIRING LINE

Can zero-weight drift through external memory replace parameter plasticity entirely?

This explores whether agents can keep learning entirely through external memory — no weight updates at all — and whether that can fully replace the older idea of learning by changing the model's parameters.


This explores whether 'learning' can move entirely out of a model's weights and into external memory — and whether that swap is total, or whether some learning still has to happen in the parameters. The corpus leans surprisingly far toward 'yes, memory can carry most of the load,' but it also quietly marks where the substitution stops being clean.

The strongest evidence for substitution comes from agents that improve without ever touching their weights. AgentFly reframes the whole learning loop as memory operations — credit assignment and policy improvement happen in case, subtask, and tool memory, and it still hits competitive benchmark scores with a frozen model Can agents learn continuously from experience without updating weights?. Reflexion shows the same trick on failure: an agent writes a verbal self-diagnosis into episodic memory and does better next episode, no gradient step required Can agents learn from failure without updating their weights?. VOYAGER pushes it further into lifelong territory, storing executable skills in a searchable library and composing new ones from old — sidestepping the catastrophic forgetting that weight-update methods suffer Can agents learn new skills without forgetting old ones?. So a real chunk of what we used to call 'plasticity' genuinely relocates outside the network.

But the corpus also says the *shape* of the memory matters more than people assume — which is the first crack in 'replace entirely.' Frozen-model agents using causal-form memory (memory that records when a lesson applies, not just what happened) beat generic reflection by 23 points and transfer better to new environments Can frozen language models continually improve through memory structure alone?. That's a clue that memory isn't a neutral substitute for weights; you have to engineer it carefully to recover what plasticity gave you for free. Some researchers go as far as calling memory architecture the new scaling frontier — arguing returns from restructuring memory now exceed returns from adding parameters Has memory architecture replaced parameter count as the scaling frontier?. And on the architecture side, Titans bakes a learned neural-memory module into the model itself to scale past attention's limits, which blurs the tidy line between 'memory' and 'weights' — its memory *has* parameters that adapt Can neural memory modules scale language models beyond attention limits?.

Here's the thing you might not have expected: even where weights still do the learning, they barely move. RL turns out to update only 5–30% of parameters, in nearly identical sparse subnetworks across random seeds — so plasticity is already far more concentrated and structured than 'retrain the whole net' implies Does reinforcement learning update only a small fraction of parameters?. That reframes the whole question: it's not memory-versus-plasticity as two rival mechanisms, but a spectrum of how much you change inside versus how much you offload outside. Hybrid designs like SoftCoT make the trade explicit — freeze the big model, train a tiny auxiliary one, and you keep pretrained knowledge while still getting new behavior Can continuous reasoning avoid forgetting in instruction-tuned models?.

So, entirely? The corpus says memory can replace weight updates for *acquiring and reusing experience* — that substitution is real and increasingly the default for agents. What it can't replace is the base model's underlying capabilities: every memory-based method here rides on a frozen model that already knows how to read, reason, and act. Memory drift changes what an agent *does* with what it knows; it doesn't expand what it fundamentally *can* know. The honest answer is that external memory replaces plasticity for the outer loop of learning, while a small, structured core of parameter change — and the frozen pretrained substrate beneath it — remains load-bearing.


Sources 8 notes

Can agents learn continuously from experience without updating weights?

AgentFly formalizes agent learning as a Memory-augmented MDP with three memory modules (case, subtask, tool) that enable credit assignment and policy improvement entirely through memory operations. The approach achieved 87.88% on GAIA validation without modifying LLM parameters.

Can agents learn from failure without updating their weights?

Reflexion demonstrates that unambiguous environmental feedback (success/failure) enables agents to write useful self-diagnoses and improve across episodes without parameter updates. The binary signal prevents rationalization, and keeping reflections uncompressed preserves their usability.

Can agents learn new skills without forgetting old ones?

VOYAGER demonstrates that storing executable skills in an embedding-indexed library and composing complex skills from simpler ones allows agents to learn continuously while avoiding the forgetting that occurs with weight-update-based methods. Environmental feedback refines skills while an automatic curriculum drives continual exploration.

Can frozen language models continually improve through memory structure alone?

Agents using causal-form memory (preserving applicability conditions) outperform generic reflection by 23 points on repeated trials and gain 4-17 points transferring to new environments, showing memory shape matters more than parameter updates.

Has memory architecture replaced parameter count as the scaling frontier?

Three converging signals in late-2025 research—taxonomy maturation, memory-aware test-time scaling loops, and hybrid sparsity laws—show that returns from restructuring memory now exceed returns from adding parameters. The design bottleneck has shifted from compute to memory structure.

Can neural memory modules scale language models beyond attention limits?

Titans architecture separates attention (short-term, quadratic) from neural memory (long-term, compressed), prioritizing surprising tokens for storage. The model outperforms standard Transformers and linear RNNs across tasks while scaling to 2M+ token contexts without quadratic penalties.

Does reinforcement learning update only a small fraction of parameters?

Across seven RL algorithms and ten LLM families, RL induces intrinsic parameter sparsity of 5–30% without explicit regularization. Critically, these sparse updates are nearly full-rank and nearly identical across random seeds, indicating structural rather than arbitrary parameter selection.

Can continuous reasoning avoid forgetting in instruction-tuned models?

SoftCoT avoids catastrophic forgetting by keeping the main LLM frozen while delegating soft thought generation to a small auxiliary model. This architectural separation maintains pre-trained knowledge while enabling continuous reasoning.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst. The question: **Can zero-weight drift through external memory replace parameter plasticity entirely?** This remains open, and the constraints may have shifted.

What a curated library found — and when (dated claims, not current truth):
Findings span 2023–2026 and show:
- Memory-based agents (AgentFly, Reflexion, VOYAGER) improve without weight updates, hitting competitive scores; memory appears to handle outer-loop learning (~2024–2025).
- Causal-form memory (recording *when* a lesson applies) outperforms generic reflection by 23 points and transfers better; memory architecture matters as much as its use (~2024).
- RL updates only 5–30% of parameters in sparse, reproducible subnetworks across seeds; plasticity is already far more concentrated than naive retraining (~2025).
- Hybrid designs (SoftCoT) freeze the main model and train tiny auxiliary networks, preserving knowledge while enabling new behavior (~2025).
- External memory cannot replace the frozen pretrained substrate's fundamental capabilities; it reshapes what an agent *does* with what it knows, not what it *can* know (~2024–2026).

Anchor papers (verify; mind their dates):
- arXiv:2310.10134 (CLIN, 2023)
- arXiv:2501.00663 (Titans, 2024)
- arXiv:2505.11711 (RL Finetunes Small Subnetworks, 2025)
- arXiv:2604.08756 (Artifacts as Memory Beyond the Agent Boundary, 2026)

Your task:
(1) RE-TEST EACH CONSTRAINT. For every finding above, assess whether newer models (e.g., o1, GPT-4o, Claude 4), improved memory architectures (hierarchical, retrieval-augmented, compositional), orchestration advances (multi-agent caching, persistent memory systems), or fresh evaluations have **relaxed or overturned** each limit. Separate durable questions (e.g., *can frozen models' core capabilities be fundamentally extended without retraining?*) from perishable limits (e.g., *memory requires manual engineering*). Plainly name what has resolved constraints and where they still hold.
(2) Surface the strongest **contradicting or superseding work** from the last ~6 months that challenges the "memory substitutes for plasticity" thesis, or deepens it in unexpected ways.
(3) Propose **2 research questions** that assume the regime may have advanced: e.g., *Does continual fine-tuning of tiny auxiliary networks rival memory-drift in sample efficiency?* or *Can learned routing between memory and sparse weight updates beat hybrid design?*

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines