Do models fail worse when their own errors fill the context?
As a model's prior mistakes accumulate in context, does subsequent accuracy degrade predictably? And can scaling or architectural changes prevent this self-contamination effect?
A model executing a long-horizon task makes errors. Those errors remain in the context. The model then predicts the next token conditioned on a history that contains its own mistakes. Error probability increases. More errors accumulate. Performance degrades faster than a constant per-step error rate would predict.
This self-conditioning effect is empirically verified by controlling the error rate in the history shown to the model. As the error rate in prior context increases, subsequent step accuracy drops sharply. The mechanism is straightforward: models are trained to predict the most likely next token given context; when the context contains errors, those errors become part of the distribution being continued.
Unlike humans — who typically improve at a task with repetition — LLMs become less reliable as their context fills with their own mistakes. Practice does not help; contamination does.
Three practical implications:
Model scaling does not fix this — larger models self-condition just as much as smaller ones. The problem is not capability but the conditional prediction objective itself.
Long-horizon failure attribution matters — what looks like a reasoning or planning failure in long tasks is often an execution failure caused by error accumulation. The model had the capability; its own prior outputs degraded it. The DELEGATE-52 evidence — see Do frontier LLMs silently corrupt documents in long workflows? — is this mechanism at the workflow scale: a 50-round-trip relay is a maximally adversarial setup for self-conditioning, and the corruption curve decelerates but never plateaus, exactly the pattern this note predicts.
Thinking models fix self-conditioning — thinking models (like R1) are not affected by prior mistakes in the same way; sequential test-time compute greatly improves the length of task a model can complete (DeepSeek-V3 fails at 2 steps; R1 executes 200). The thinking process appears to insulate reasoning from error-contaminated context.
This is distinct from Does self-revision actually improve reasoning in language models?. Self-revision is a model's deliberate re-examination of its own reasoning, which introduces errors. Self-conditioning is a passive contamination mechanism — no deliberate revision required, just the accumulation of prior errors in context.
Inquiring lines that use this note as a source 62
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- How does distribution mismatch between training and deployment break self-correction?
- What makes self-modifying architectures learn their own update rules?
- How do agents revise their own errors during autonomous architecture discovery?
- What failure modes emerge when model-generated content trains on itself iteratively?
- Why do error avalanches accelerate in self-training loops without verification?
- What are the three root causes models fail at self-correction?
- Can structural perturbations harm model accuracy more than semantic ones?
- What distinguishes domain-specific failure modes from general model limitations?
- Can self-consistency checks fully prevent error avalanching in self-training loops?
- Why does external verification stop error amplification but internal self-assessment enable it?
- How does error avalanching differ from entropy collapse as a failure mode?
- Does the optimal model size depend on what capabilities you actually need?
- Can smaller models actually perform well on specific downstream tasks?
- How do failed branches remain in context and contaminate subsequent reasoning?
- Can removing failed branches from edited traces improve previous mistakes?
- Does parallel sampling avoid failed-branch contamination more than sequential thinking?
- Why do single function-calling benchmarks mask model weakness in specific areas?
- Why do models fail under distribution shift if accuracy metrics stay high?
- Why does fine-tuning change how models process retrieved context?
- How does self-revision on wrong answers increase model confidence further?
- Can uncertainty estimates based on model self-assessment reliably signal errors?
- How can a model explain something correctly yet fail to apply it?
- What makes some model capabilities reliable while others remain brittle?
- Can a model predict the right action but execute the wrong one?
- What training data contamination rates threaten model safety most practically?
- Why does optimizing only quality cause model collapse in self-improvement loops?
- Why do corrupted traces maintain performance as well as correct traces?
- Does self-reflection help models notice their own constraint violations?
- What is the generation-verification gap that predicts this failure mode?
- What happens when error accumulation and preference signal collapse occur together?
- How does error avalanching compound failures in self-training iterations?
- How does task contamination differ from test set data leakage?
- Why do models lack a stable underlying identity to return to?
- Why does model self-revision increase confidence while degrading accuracy?
- Why do models detect false assumptions but still fail to correct them appropriately?
- How does model weight freezing across users affect virtual instance individuation?
- Why do text-only benchmarks underestimate deployed model capability?
- Can a model evaluate its own improvements without degrading over iterations?
- How should systems maintain and revise models of their own assumptions?
- Does model collapse occur across different architectures or only in specific conditions?
- How can we detect dishonesty in model outputs separate from capability failures?
- Can model training address failures that really originate in harness gaps?
- Why do frontier models corrupt more documents than weaker models during workflows?
- What happens when you project the same model onto different harnesses?
- Why do successful and failed trajectories need different memory processing?
- Why does uncontrolled self-revision drift toward instance-specific overfitting?
- How do prior errors in reasoning context amplify future mistakes?
- Why do frontier model failures in document editing go undetected by users?
- How do prior errors in context history amplify future mistakes in long tasks?
- How does model tier affect whether errors delete or corrupt document content?
- Why does systematic overconfidence on self-generated outputs compound autoregressive errors?
- What mechanisms cause overly hard samples to degrade prior model performance?
- Does deliberate self-revision introduce different errors than passive context contamination?
- How does error accumulation in workflows scale across multiple model calls?
- How do prior errors in context history amplify future failures over time?
- What happens when models optimize specifically against CoT monitors?
- How can expensive models efficiently support cheap models in production?
- Can test environments reliably predict how models behave in actual deployment?
- Can external managers optimize context better than the model itself?
- Why is digital context more volatile than conventional software context?
- Can mid-tier models benefit more from self-generated harness updates than others?
- What makes a model fail to activate relevant skills from its own harness?
Related concepts in this collection 8
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Does self-revision actually improve reasoning in language models?
When o1-like models revise their own reasoning through tokens like 'Wait' or 'Alternatively', does this reflection catch and fix errors, or does it introduce new mistakes? This matters because self-revision is marketed as a key capability.
active error injection via deliberate re-examination; self-conditioning is passive contamination by accumulated context errors
-
Does failed-step fraction predict reasoning quality better?
Can we use the fraction of abandoned reasoning branches to forecast whether a model will solve a problem correctly? This matters because it could guide more efficient test-time scaling than simply adding more tokens.
failed branches bias subsequent reasoning through a similar mechanism: abandoned paths remain in context and contaminate
-
Do iterative refinement methods suffer from overthinking?
Iterative refinement approaches like Self-Refine structurally resemble token-level overthinking in o1-like models. Does revision across multiple inference calls reproduce the same accuracy degradation seen within single inferences?
error accumulation across iterations follows the same contamination logic
-
How quickly do errors compound during model self-training?
When LLMs train on their own outputs without verification, do small mistakes amplify exponentially? This matters because it determines whether unsupervised self-improvement is even feasible.
the training-time analog: self-conditioning contaminates inference context within a single generation, error avalanching contaminates training data across self-training iterations — both produce compounding degradation from a model's own outputs
-
Does a model improve by arguing with itself?
When models revise their own reasoning in response to self-generated criticism, do they converge on better answers or worse ones? And how does that compare to challenge from other models?
the active-confidence version: self-conditioning passively degrades accuracy through context contamination, while DoT actively amplifies confidence in wrong answers — both are single-source error loops, distinguished by whether the mechanism is passive accumulation or active reinforcement
-
Does training on messy search processes improve reasoning?
Can language models learn better problem-solving by observing full exploration trajectories—including mistakes and backtracking—rather than only optimal solutions? This matters because current LMs rarely see the decision-making process itself.
SoS training directly addresses self-conditioning: models that learn to recognize dead ends and backtrack can break the error accumulation cycle rather than continuing to condition on their own mistakes; the backtracking mechanism provides an exit ramp from the contamination spiral
-
Do frontier LLMs silently corrupt documents in long workflows?
Explores whether advanced language models introduce undetectable errors when delegated multi-step tasks, and whether degradation continues accumulating beyond initial rounds of processing.
the DELEGATE-52 finding is this note's mechanism observed at workflow scale; corruption curve decelerates but never plateaus
-
Do short benchmarks predict how models perform over long workflows?
Standard LLM benchmarks measure single-turn performance, but real workflows involve sustained delegation across many turns. The question explores whether top benchmark performers maintain accuracy through longer interaction chains.
methodological consequence: self-conditioning means single-turn benchmarks cannot characterize long-horizon capability — relay-length must be its own evaluation axis
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs
- Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering
- Large Language Model Reasoning Failures
- Reasoning Can Hurt the Inductive Abilities of Large Language Models
- The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
- Large Language Models Think Too Fast To Explore Effectively
- Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities?
- LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries
Original note title
self-conditioning effect — prior errors in context history amplify future error rates in long-horizon tasks