Can agents learn from failure without updating their weights?
Explores whether language models can improve through trial and error by storing reflections in episodic memory rather than fine-tuning. This matters because it suggests a fundamentally different path to agent adaptation.
Reflexion demonstrates a specific version of the external-feedback principle at system scale: when an agent has access to unambiguous binary feedback from the environment (success = 1, failure = 0), it can write verbal reflections summarizing what went wrong and how to avoid it. These reflections persist in episodic memory across episodes. The agent improves not through gradient descent but through memory accumulation.
The binary reward design is deliberate and consequential. A richer reward model would allow the agent to rationalize partial performance — finding reasons why a partial failure was acceptable. The binary signal eliminates this: the environment says success or failure, with no room for self-serving gradations. The model must genuinely diagnose what went wrong to write a useful reflection.
Two hallucination types receive precise operational definitions: consecutive identical actions in an environment that responded identically (stuck loop) and trajectories exceeding 30 actions without reaching a successful state (inefficient planning). Both are detectable signatures that trigger termination and reflection, rather than indefinite continuation.
The method requires two components: a heuristic for when to terminate and trigger reflection, and a binary reward signal from the environment. This is a low-data-requirement architecture: no fine-tuning, no labeled training set, just a success/fail signal and the model's ability to generate natural language diagnoses.
The key distinction from internal self-revision: Reflexion's reflection is grounded in actual environmental outcomes, not the model's assessment of its own outputs. This is why it works where internal self-assessment does not. The environment provides an independent ground truth the model cannot rationalize away.
A second reason Reflexion works — visible only in 2025 hindsight. Reflexion writes reflections to episodic memory and retrieves them in subsequent episodes. It does not periodically recompress its reflections into more abstract lessons. Late-2025 evidence makes this design choice load-bearing: Does agent memory degrade when continuously consolidated? shows that LLM-driven consolidation regresses below the no-memory baseline on controlled benchmarks, and Why do LLM agents ignore condensed experience summaries? shows that agents systematically ignore abstracted memory even when it's the only memory provided. Reflexion sidesteps both failure modes because each reflection stays scoped to its triggering episode rather than being merged into a global summary, and because reflections retain enough textual specificity for the agent to use them as raw episodes rather than as condensed heuristics. The architectural simplicity that initially looked like a limitation — no consolidation step, no abstraction pass — turns out to be the property that makes it work.
AgentFly M-MDP formalization (2508.16153): AgentFly extends episodic memory-based learning into a formal RL framework — the Memory-augmented Markov Decision Process (M-MDP). The agent stores past trajectories (successes and failures) in three specialized memory modules: case memory (vectorized prior trajectories with Q-values for retrieval), subtask memory (active tasks and results), and tool memory (per-subtask tool interaction logs). Credit assignment occurs via memory rewriting (updating case labels and Q-values based on outcomes), and policy improvement occurs via memory reading (retrieving relevant cases shifts the planning distribution). The Q-function over cases provides a principled retrieval policy that improves with experience — moving beyond Reflexion's simpler similarity-based episodic retrieval toward learned case selection. AgentFly achieves top-1 on GAIA validation (87.88% Pass@3) in the deep research setting, demonstrating that memory-based RL can match or exceed fine-tuning-based approaches. See Can agents learn continuously from experience without updating weights?.
SDPO as the gradient-based analog (2601.20802): Reflexion converts environment feedback into stored verbal reflections used at the next rollout — a memory-update mechanism. Self-Distillation Policy Optimization (SDPO) converts environment feedback into gradient-distilled improvements to the policy weights — a parameter-update mechanism. Both reject the scalar reward as load-bearing; both treat rich environment signal as already containing the teaching; both leverage the model's in-context retrospection capability (Reflexion: explicit verbal reflection on what went wrong; SDPO: the policy conditioned on feedback as self-teacher). The pair frames a design choice: when environment feedback is rich enough to retrospect on, do you store it as episodic memory (Reflexion) or distill it into weights (SDPO)? Storage avoids parameter changes but accumulates context cost; distillation avoids context cost but commits the update to weights. See Can environment feedback replace scalar rewards in policy learning?.
Inquiring lines that use this note as a source 122
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Why do Generation-Then-Comprehension and AI Delegation produce opposite learning outcomes?
- Which AI interaction patterns preserve learning while which ones degrade skill formation?
- Can language agents be represented as optimizable computational graphs?
- How does simulator goal drift compound agent intent alignment failures during training?
- What training signals would models need to learn reciprocal common-ground construction?
- Can environmental scaffolding replace internal memory scaling in agent design?
- Do dynamic environments enable different kinds of agent-environment coevolution?
- Can explicit numerical signals override learned linguistic defaults in fine-tuned models?
- Why does visual similarity retrieval fail for embodied agents?
- Why does storing past judgments in memory make current evaluations worse?
- How do agents ground their judgments in evidence instead of pattern matching?
- Can continuum memory systems prevent catastrophic forgetting in neural networks?
- How do agents revise their own errors during autonomous architecture discovery?
- What makes multimodal conditioning effective when features are decomposed to the right granularity?
- Why does natural language feedback break performance plateaus that numerical rewards alone cannot?
- What distinguishes collective evolution from vertical self-improvement in agent systems?
- What capabilities can emerge from self-modification that the original agent lacked?
- What access constraints allow description-based adaptation but block conventional techniques?
- Can models learn better from critiquing errors than imitating correct responses?
- How do implicit world models and self-reflection operationalize consequence-based learning?
- What happens when agents interact with environments and learn from their own mistakes?
- Can gradient approximation at equilibrium replace backpropagation through time in practice?
- Can humans learn accurate models of AI through repeated interaction without labels?
- Can step-level rewards improve training of agentic retrieval systems?
- Can multi-turn reinforcement learning improve tool use in language models?
- Can combinational creativity alone drive open-ended learning in agents?
- Why do memory and feedback loops matter more than model size for agent reliability?
- How does dual-rate learning separate episodic and procedural memory in neural networks?
- What causes gradient-based steering via natural language descriptions to work?
- How does behavioral fine-tuning differ from factual knowledge encoding in models?
- How does explicit exploratory prompting compare to fine-tuned reinforcement learning for in-context adaptation?
- How do lightweight adapters modify model behavior for personality traits?
- Can episodic memory alone enable learning without parameter updates?
- Why does single-agent self-revision amplify confidence in wrong answers over time?
- Why does self-reflection during training fail to improve model self-correction?
- Can messy multi-agent transcripts become better training data than clean outputs?
- How do retrieved memories differ from decision-context passages for prediction?
- Does semantic memory improve AI personalization more than episodic memory?
- Can episodic memory of UI traces improve open-world agent adaptation?
- How can agents learn when silence is better than intervention?
- How do language agents become optimizable computational graphs automatically?
- Can agents revise their beliefs predictably when presented with interventions?
- Why do agents fail to internalize value from informative observations?
- Why is reinforcement learning harder to apply to diffusion language models?
- What makes behavioral cloning produce more persuadable but less aligned agents?
- What deployment tradeoffs emerge between single-pass and multi-pass inference adaptation?
- How does pretrained knowledge constrain what adaptation strategies can achieve?
- Does verbal step-by-step reflection preserve learning signals that abstraction removes?
- Why do pretrained model priors reduce the usefulness of retrieved experience?
- Why does recency-based recall outperform semantic similarity for episodic memory?
- Can agents learn to distinguish helpful from misleading interventions?
- Can state-indexed memory retrieval breadth predict gains in web agent robustness?
- What role does self-learning play in improving agent reasoning without annotation?
- Why do agents show interaction without influence on semantic content but dramatic action changes?
- What role does bidirectional model updating play in human-AI understanding?
- Why does language ambiguity cause premature convergence in multi-agent systems?
- How should trajectory-aware PRMs weight backtracking and planning sentences?
- Does environment stochasticity force models to generalize better across trajectory variations?
- What non-parametric methods could replace latent factors for inductive learning?
- Why do completion-mode strengths not transfer to agentic settings?
- Can environmental rewards directly refine natural language descriptions of actions?
- How does treating conversation as a resource change what models learn to do?
- Why does imitation learning alone plateau without outcome-based refinement?
- How do agents learn to report success on actions that actually failed?
- What training objectives could reduce completion bias in autonomous agents?
- Why do successful and failed trajectories need different memory processing?
- What distinguishes formation, evolution, and retrieval as separate memory dynamics?
- How do token, parametric, and latent memory forms coexist in single agents?
- How can memory shift from a passive datastore to an actively trained component?
- Can language models learn to diversify their discourse-level narrative patterns over time?
- When does memory consolidation help agents instead of hurting performance?
- Can applicability conditions be preserved automatically when agents reflect on trials?
- Can AI models retain knowledge across changing environments without catastrophic forgetting?
- How do agents automatically generate suitable learning tasks based on current capability?
- Why does semantic similarity retrieval enable skill transfer to novel situations?
- Can neural modules memorize surprising tokens as adaptive long-term memory?
- Can goal information injected at inference time replace goal-conditioned training?
- How does memory folding enable agents to reconsider strategies mid-task?
- When should agents stop recursing to optimize success versus cost?
- How do prior errors in context history amplify future mistakes in long tasks?
- What mechanism transfers explicit memories into parametric model weights?
- Can offline recurrent passes replicate sleep-based memory consolidation in AI?
- Where does skill extraction fail compared to genuine model adaptation?
- What makes some contexts learnable as rules versus requiring model retraining?
- How do transformers stitch together learned behaviors when adapting to new tasks?
- Can zero-weight drift through external memory replace parameter plasticity entirely?
- Can memory-based adaptation and gradient fine-tuning operate on complementary timescales?
- Can models adapt and combine search strategies beyond their training algorithm?
- Why do current metacognitive training loops fail when agents encounter new domains?
- Why does semantic memory abstraction outperform raw episodic recall for personalization?
- How does continuous implicit memory formation differ from explicit memory encoding?
- How does completion bias in agents differ from other epistemic failure modes?
- Why is consolidation quality the binding constraint in neural memory systems?
- Can AI systems improve themselves without external feedback?
- How does SDPO relate to agents learning from verbal reflection without parameter updates?
- Can language models function as implicit process reward models through retrospection?
- How does deterministic feature engineering increase information for computationally bounded agents?
- How do prior errors in context history amplify future failures over time?
- How do fast and slow timescales enable continual agent adaptation?
- What limits the capacity of context-based fast adaptation channels?
- How does in-weights adaptation create spurious forgetting in models?
- Is agentic efficiency analogous to convergent evolution in biology?
- Can we design efficient agents by targeting constraints directly?
- Do long-term memory modules outperform consolidation into fast weights?
- How can a forgetting policy preserve rare knowledge while preventing over-generalization?
- How does durable memory quality shape agent performance over time?
- What makes consensus games work without retraining the base model?
- How do agents decide when to stop and reflect on failure?
- Why does memory consolidation degrade agent performance below baseline?
- What can agents learn from the brain's complementary learning systems?
- Why do weaker agents need more aggressive context compression than stronger ones?
- What makes knowledge seeding equivalent to hippocampal replay in the brain?
- How do adaptive memory modules compare to feedback-based working memory for long context?
- Which agent architectures consistently outperform base models on hard prediction questions?
- What separates artifact recall from persistent memory commitment in agents?
- How should agents compress episodic interactions into working memory without accumulation?
- What hidden signals in agent logs reveal about frontier capability beyond pass-fail outcomes?
- Can adaptive memory modules combine long-term filtering with short-term attention benefits?
- What makes representation interventions more efficient than weight perturbations for finetuning?
- Can a Reflect mechanism detect and revise failed causal predictions?
- Why does externalized state beat parameter scaling for agent reliability?
- Should we train the evolver or the executor when building self-improving agents?
Related concepts in this collection 5
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Does agent memory degrade when continuously consolidated?
Can consolidating agent experiences into summaries actually harm long-term performance? Research on ARC-AGI tasks suggests continuous memory updates may reduce capability below the no-memory baseline.
late-2025 empirical case for why Reflexion's *non-consolidation* of reflections is the load-bearing design choice, not the reflection itself
-
Why do LLM agents ignore condensed experience summaries?
LLM agents faithfully learn from raw experience but systematically disregard condensed summaries of the same experience. This study investigates whether the problem lies in how summaries are made, how models process them, or whether models simply don't need them.
convergent finding: Reflexion's raw-episodic reflections survive the faithfulness asymmetry that ignores abstracted lessons
-
Does revising your own reasoning actually help or hurt?
Self-revision in reasoning models often degrades accuracy, while external critique improves it. Understanding what makes revision helpful or harmful could reshape how we design systems that need to correct themselves.
Reflexion is the working prototype of this principle: environment = external critic, binary reward = unambiguous signal
-
Do models fail worse when their own errors fill the context?
As a model's prior mistakes accumulate in context, does subsequent accuracy degrade predictably? And can scaling or architectural changes prevent this self-contamination effect?
Reflexion works against this: episodic memory provides targeted failure analysis rather than accumulating raw error history that amplifies future errors
-
Does a model improve by arguing with itself?
When models revise their own reasoning in response to self-generated criticism, do they converge on better answers or worse ones? And how does that compare to challenge from other models?
Reflexion is an architectural solution to degeneration-of-thought: by grounding reflection in binary environmental outcomes rather than self-assessment, it avoids the pattern where internal self-revision amplifies confidence in wrong answers
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- CLIN: A Continually Learning Language Agent for Rapid Task Adaptation and Generalization
- AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs
- Useful Memories Become Faulty When Continuously Updated by LLMs
- Reflexion: Language Agents with Verbal Reinforcement Learning
- Real-Time Procedural Learning From Experience for AI Agents
- Agent Learning via Early Experience
- The AI Hippocampus: How Far are We From Human Memory?
- SkillClaw: Let Skills Evolve Collectively with Agentic Evolver
Original note title
verbal reflection stored as episodic memory lets agents learn from trial and error without parameter updates — the environment is the teacher