SYNTHESIS NOTE

Can agents learn better from their failures than successes?

Does storing reasoning strategies extracted from both successful and failed experiences improve agent learning compared to tracking only successes or raw trajectories? This matters because failures offer preventative lessons that successes alone cannot teach.

Synthesis note · 2026-05-18 · sourced from Memory

ReasoningBank (2509.25140) departs from prior agent-memory work along two axes at once. First, it stores strategy-level reasoning hints rather than reusable workflows, instance-level concepts, or raw trajectories. Second, it draws those strategies from both successful AND failed experiences — judged by the agent itself without ground-truth labels. The combination matters because each axis on its own underperforms the joint version.

The strategy-level abstraction is what differentiates it from agent-workflow-memory approaches, which store procedural sequences. A reusable workflow says "to find a place's zip code, first search by name, then extract location, then look up zip." A strategy says "when an entity attribute is requested, identify which lookup primitive returns it most directly; chain only when a single primitive cannot suffice." Strategies generalize across tasks; workflows generalize across instances of the same task.

The failure-inclusion is what differentiates it from systems that only store successful trajectories. Failed experiences contribute preventative lessons — strategies that look promising but fail under specific conditions. The agent abstracts both into actionable principles. This addresses a known gap: success-only memory teaches what worked but never what to avoid.

The deeper finding is memory-aware test-time scaling (MaTTS). Scaling test-time compute generates more rollouts per task; more rollouts generate diverse experiences; diverse experiences provide richer contrastive signals for distilling higher-quality memory; better memory guides subsequent scaling toward more promising rollouts. Memory and compute compound rather than substitute. This is a different scaling law from the parameter scaling law — accuracy improves with cumulative interaction history, not just with one-time training compute.

The implicit theory of mind: agents become more capable not by accumulating data but by accumulating judged distinctions. The self-judgment step is doing the work. ReasoningBank can label its own success/failure because the agent has access to the task-grounded signals (did the search return useful results? did the action achieve the subtask?) — labels are emergent from interaction rather than annotation. This makes the approach scalable in deployment, not just in training.

The result reframes the relationship between memory and inference compute. Prior work treated them as separate dimensions; ReasoningBank shows they are coupled, and their coupling is itself a scaling axis.

Inquiring lines that read this note 23

This note is a source for these research framings, grouped by the broader line of inquiry each explores. Scan the bold lines of inquiry; follow any specific question forward.

How can AI systems learn from failures without cascading errors?

Why do agents confidently report success despite actually failing tasks?

How can AI agents autonomously learn and transfer skills across tasks?

How should agents balance memory condensation to optimize context efficiency?

Do agents prefer raw experience over condensed summaries of past actions?

Why do reward structures fail to shape long-term agent learning?

What constrains reinforcement learning's ability to expand model reasoning?

What failure modes do imitation and outcome methods each address?

How should memory consolidation strategies shape agent performance over time?

Why do successful and failed trajectories need different memory processing?

How does AI assistance affect human cognitive development and reasoning autonomy?

How do agents decide when to pause and reflect on their strategy?

How do training priors constrain what context information can override?

Why does negative experience transfer better than positive examples alone?

What coordination failures limit multi-agent LLM systems as they scale?

How do agent teams use shared failures to reduce redundant exploration?

Related concepts in this collection 4

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

17 direct connections · 128 in 2-hop network ·medium cluster Open in graph ↗

Can agents learn better from their failures than… Can agents learn reusable sub-task routines from p… Can frozen language models continually improve thr… Can agents learn from failure without updating the… Does agent memory degrade when continuously consol…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Can agents learn reusable sub-task routines from past experience? Do web agents fail at long-horizon tasks because they cannot extract and reuse workflows shared across similar problems? This explores whether sub-task abstraction enables skill accumulation rather than task-by-task problem solving.
AWM stores procedural workflows; ReasoningBank abstracts higher to strategies that span tasks
Can frozen language models continually improve through memory structure alone? If agents can't update parameters, what form of textual memory lets them keep learning across trials and transfer to new tasks without retraining?
CLIN stores causal abstractions; ReasoningBank's strategy abstractions are a strategic cousin operating without environment-specific causal structure
Can agents learn from failure without updating their weights? Explores whether language models can improve through trial and error by storing reflections in episodic memory rather than fine-tuning. This matters because it suggests a fundamentally different path to agent adaptation.
Reflexion uses raw episodic reflection; ReasoningBank distills across episodes into transferable strategies
Does agent memory degrade when continuously consolidated? Can consolidating agent experiences into summaries actually harm long-term performance? Research on ARC-AGI tasks suggests continuous memory updates may reduce capability below the no-memory baseline.
direct tension: ReasoningBank claims consolidation works when done over strategies-with-conditions; faulty-memory paper shows consolidation regresses below baseline; resolution may be in *what* gets abstracted

Can agents learn better from their failures than successes?

Inquiring lines that read this note 23

Related concepts in this collection 4

Related papers in this collection 8

Search by related questions 4