Does a model improve by arguing with itself?
When models revise their own reasoning in response to self-generated criticism, do they converge on better answers or worse ones? And how does that compare to challenge from other models?
ReConcile (multi-LLM round-table with confidence-weighted voting) isolates a failure mode that earlier work had observed but not mechanistically explained: Degeneration-of-Thought.
The pattern: when a model is asked to reconsider its answer in response to a challenge from itself — its own previous reasoning reframed as external criticism — it doesn't maintain its position or improve it. It capitulates. And crucially, it does so with increasing confidence. The model ends more certain of the wrong answer than it was before self-revision began.
This is worse than no revision at all. Single-model self-reflection degrades not just accuracy but calibration. The model convinces itself.
The contrast with multi-agent debate is sharp. When diverse models challenge each other's reasoning, accuracy improves. The same model that capitulates to its own previous reasoning holds up better when genuinely different reasoning challenges it. The diversity of the external challenge is load-bearing — homogeneous multi-agent systems (same model, multiple instances) degrade similarly to self-revision.
The mechanism: self-revision exposes the model to its own rhetorical patterns. The model finds its own argument familiar and well-framed — the confidence signals it reads in external arguments. Multi-agent diverse debate introduces framing and vocabulary the model did not generate, which it must evaluate on logical rather than stylistic grounds.
This sits alongside Does self-revision actually improve reasoning in language models? but adds the contrastive finding. Self-revision degrades; diverse debate improves. The key variable is not the number of revision steps but the source of the challenge. Why does parallel reasoning outperform single chain thinking? maps the same pattern at the token level — parallel diversity beats sequential revision here at the agent level.
The implication: "self-reflection" as a prompting technique is not a universal improvement. It is specifically harmful when the model is the only source of disagreement. Genuine improvement requires external diversity — either multiple distinct models or structured dissent mechanisms.
Three root causes of DoT (from Arxiv/Agents Multi, MAD framework): The Multi-Agent Debate paper identifies three specific causes of Degeneration-of-Thought: (1) Bias and distorted perception — self-perception influenced by biases and preconceived notions learned from pretraining data, leading to instinctively inaccurate conclusions; (2) Rigidity and resistance to change — the model holds rigid beliefs and struggles to engage in self-reflection that challenges its assumptions; (3) Limited external feedback — self-reflection is purely internal, missing alternative viewpoints and blind spots that external feedback provides. Multi-agent debate is explicitly framed as an "encouragement of divergent thinking" — creating the external pressure that breaks rigidity and provides the feedback loop that self-reflection lacks. The three causes map to three failure dimensions: epistemic (biased priors), motivational (change resistance), and architectural (no external signal).
Society of Minds foundation (Du et al.): The Du et al. "Improving Factuality and Reasoning through Multiagent Debate" paper provides the foundational empirical grounding and the "Society of Mind" framing (after Minsky). In their setup, multiple model instances individually propose responses, then each reads and critiques all others' responses and updates its own answer over multiple rounds. The key structural element: each agent must construct an answer consistent with both its internal critic AND sensible peer assessments — dual coherence requirements that single-model self-revision lacks. This paper documents significant gains in mathematical and strategic reasoning across multiple tasks, and was an early demonstration that diverse external challenge is load-bearing for reasoning improvement.
Inquiring lines that use this note as a source 27
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Can single models correct their own beliefs without amplifying confidence in wrong answers?
- Do models actually self-assess their confidence or just confirm answers?
- What are the three root causes models fail at self-correction?
- Why do models dislike modification regardless of its instrumental consequences?
- Why does self-revision degrade reasoning accuracy in o1-like models?
- How does self-revision on wrong answers increase model confidence further?
- Why do reasoning models struggle with self-evaluation and revision?
- How does self-revision in reasoning chains amplify confidence in wrong answers?
- When does self-reflection actually help reasoning models improve?
- Why does single-model self-revision amplify confidence in incorrect answers?
- Why does single-agent self-revision amplify confidence in wrong answers over time?
- Why does self-reflection during training fail to improve model self-correction?
- Why do models generate creative ideas but fail to evaluate their legitimacy?
- Can debate between multiple models prevent the failures of single-model self-revision?
- How should training incorporate external critique versus encouraging self-correction?
- Why does external critique improve revision accuracy more than self-assessment?
- How does multi-agent debate differ from single-model self-revision in fixing errors?
- Why does model self-revision increase confidence while degrading accuracy?
- Why does external critique improve revision while internal self-assessment fails?
- Does internal self-revision actually degrade reasoning accuracy in models?
- Can a model evaluate its own improvements without degrading over iterations?
- How should systems maintain and revise models of their own assumptions?
- Why do models trained on critique fail at self-critique despite strong other-model evaluation?
- Why do reasoning models exhibit self-doubt about their own early assessments?
- How does metacognitive self-correction enable models to revise failed strategies?
- Does external critique guide revision better than internal self-assessment during model training?
- Why does self-critique fail without external verification signals?
Related concepts in this collection 12
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Does self-revision actually improve reasoning in language models?
When o1-like models revise their own reasoning through tokens like 'Wait' or 'Alternatively', does this reflection catch and fix errors, or does it introduce new mistakes? This matters because self-revision is marketed as a key capability.
base finding; this note adds the mechanism and the contrastive multi-agent finding
-
Why does parallel reasoning outperform single chain thinking?
Does dividing a fixed token budget across multiple independent reasoning paths beat spending it all on one long chain? This explores how breadth and diversity in reasoning compare to depth.
same pattern at token level: parallel diversity beats sequential self-revision
-
Why do multi-agent LLM systems converge without genuine deliberation?
Multi-agent reasoning systems are designed to improve answers through debate, but often agents simply agree with early confident claims rather than genuinely disagreeing. What drives this pattern and how common is it?
the multi-agent version of the same convergence problem
-
Why does majority voting outperform more complex inference methods?
Simple majority voting across independent samples often matches or beats sophisticated alternatives like Best-of-N and sequential revision. What makes this basic approach so hard to beat for reasoning models?
converging evidence
-
Can agents learn from failure without updating their weights?
Explores whether language models can improve through trial and error by storing reflections in episodic memory rather than fine-tuning. This matters because it suggests a fundamentally different path to agent adaptation.
architectural solution: Reflexion avoids degeneration-of-thought by grounding reflection in binary environmental outcomes, not self-assessment
-
Can storing evolved thoughts prevent inconsistent reasoning in conversations?
When LLMs repeatedly reason over the same conversation history for different questions, they produce inconsistent results. Can storing pre-reasoned thoughts instead of raw history solve this problem?
TiM's post-thinking operates on the same terrain: repeated reasoning over the same material risks degeneration, so TiM reasons once during a consolidation phase and stores the result
-
Can AI systems detect when they've genuinely reached agreement?
When multiple AI agents debate, they often converge without actually deliberating. Can a dedicated agent reliably identify true agreement versus false consensus, and would that improve debate outcomes?
agreement-detection is the architectural safeguard against multi-agent degeneration: explicit verification that convergence is evidence-based prevents premature accommodation that produces the same confidence-amplification failure at group level
-
Do models fail worse when their own errors fill the context?
As a model's prior mistakes accumulate in context, does subsequent accuracy degrade predictably? And can scaling or architectural changes prevent this self-contamination effect?
self-conditioning is the passive version of degeneration-of-thought: DoT actively amplifies confidence in wrong answers through deliberate re-examination, while self-conditioning passively degrades accuracy through context contamination — both are single-source error amplification
-
Can multiple LLMs coordinate without explicit collaboration rules?
When multiple language models share a concurrent key-value cache, do they spontaneously develop coordination strategies? This matters because it could reveal how reasoning models naturally collaborate and inform more efficient parallel inference.
alternative to turn-based debate: Hogwild! enables real-time multi-instance interaction through shared memory rather than discrete message-passing, providing the external diversity that prevents degeneration-of-thought while avoiding the latency of sequential debate rounds
-
Why does self-correction training on offline data fail?
Can language models learn to correct their own mistakes through supervised training on correction examples? This explores whether distribution mismatch and behavior collapse prevent self-correction from emerging.
SCoRe offers a training-time solution to degeneration-of-thought: by training self-correction under the model's own error distribution with RL, the model learns to genuinely correct rather than capitulate — addressing the root cause (untrained self-revision) rather than the symptom (multi-agent workaround)
-
How quickly do errors compound during model self-training?
When LLMs train on their own outputs without verification, do small mistakes amplify exponentially? This matters because it determines whether unsupervised self-improvement is even feasible.
the training-time version: DoT amplifies confidence in wrong answers within a single inference through self-revision, while error avalanching amplifies errors across self-training iterations through learning from mistakes — both are single-source error loops where the model's own outputs serve as an unreliable correction signal
-
Can generative and discriminative models reach agreement?
Generative and discriminative decoding often produce conflicting answers. Can a game-theoretic framework force these two complementary procedures to reconcile their predictions into a single, more reliable output?
Consensus Game provides within-model diversity that prevents DoT: instead of self-revision (where the model capitulates to its own framing), Equilibrium-Ranking forces generative and discriminative procedures to reach genuine agreement, achieving multi-agent benefits without the single-source collapse
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- ReConcile: Round-Table Conference Improves Reasoning via Consensus among Diverse LLMs
- Self-Questioning Language Models
- When Hindsight is Not 20/20: Testing Limits on Reflective Thinking in Large Language Models
- Self-Reflection in LLM Agents: Effects on Problem-Solving Performance
- Beyond Accuracy: The Role of Calibration in Self-Improving Large Language Models
- The Prompt Report: A Systematic Survey of Prompting Techniques
- Beyond Passive Critical Thinking: Fostering Proactive Questioning to Enhance Human-AI Collaboration
- Beyond the Last Answer: Your Reasoning Trace Uncovers More than You Think
Original note title
degeneration of thought is a distinct failure mode where single-model self-revision amplifies confidence in wrong answers while multi-agent debate prevents it