SYNTHESIS NOTE
Agentic Systems and Tool Use

Can frozen language models continually improve through memory structure alone?

If agents can't update parameters, what form of textual memory lets them keep learning across trials and transfer to new tasks without retraining?

Synthesis note · 2026-05-03 · sourced from Action Models

CLIN argues that the bottleneck for continual learning in language agents is not parameter updates but the structure of what gets remembered. Reflexion-style agents (see Can agents learn from failure without updating their weights?) maintain "helpful hints" — generic verbal reflections that work for the immediate trial but transfer poorly across tasks and environments. CLIN's wager is that a specific style of memory — causal abstractions of the form "opening doors may be necessary for movement between rooms" — produces durable, transferable knowledge because causal structure is what predicts which action to take next.

Empirically the wager pays off. On ScienceWorld, CLIN beats SOTA reflective agents like Reflexion by 23 absolute points on repeated trials. More importantly it transfers: zero-shot performance on new environments improves by 4 points (13 for new tasks), and continued memory updates in the new setting add another 17 points (7 for new tasks). The causal-abstraction memory is therefore not just a within-task accelerator but a substrate for cross-environment generalization.

The conceptual move is to position language-model agents as a modern instantiation of action model learning — but with the action model written in natural language and continually edited rather than learned as parameters. Useful causal knowledge persists across trials, unhelpful causal knowledge is dropped. This suggests a new architectural pattern: agents built on frozen models can still continually and rapidly improve over time if the memory representation is the right shape. The shape that matters is causal, not encyclopedic — a position that pairs interestingly with Can agents learn reusable sub-task routines from past experience? (workflow-shaped memory) and Does state-indexed memory outperform high-level workflow memory for web agents? (state-action-shaped memory). The three notes target the same problem (what shape should agent memory take?) and disagree on the answer.

Why causal-form survives where heuristic consolidation fails. Late-2025 evidence reframes CLIN's success. The pattern "opening doors may be necessary for movement between rooms" is not just a useful abstraction — it is an applicability-conditional. The "may be necessary" preserves when the abstraction holds. Compare this to a heuristic summary like "always open doors to make progress," which strips the condition. See Does agent memory degrade when continuously consolidated? for the empirical case that LLM-driven consolidation regresses below no-memory baselines precisely because it strips applicability conditions, and see the tension ops/tensions/strategy-distillation helps when applicability conditions survive — and hurts when they are stripped.md for the resolution hypothesis CLIN exemplifies. CLIN's success and Reflexion's success may both reduce to the same axis: the question is not "raw or abstract" but "does the form preserve the conditions of application." Causal abstractions preserve them by syntactic design; heuristic summaries do not.

Inquiring lines that use this note as a source 9

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 9

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
15 direct connections · 90 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

causal abstractions in dynamic textual memory let frozen-model agents continually improve — outperforming Reflexion by 23 points without parameter updates