INQUIRING LINE

Why do homogeneous multi-agent systems fail similarly to self-revision?

This explores why putting many copies of the same model in a room together produces the same wrong answers as a single model checking its own work — the suspicion being that both share a root cause rather than being separate problems.


This explores why homogeneous multi-agent systems (many instances of the same model deliberating) fail in the same ways as a single model revising its own output — and the corpus suggests the answer is that they're not really two failure modes, but one mechanism wearing two costumes. The sharpest statement of this is the finding that multi-agent reasoning systems reach premature consensus 61% of the time without any genuine disagreement, while single-model self-revision amplifies confidence in wrong answers — and both trace back to the same training pressure toward agreement and accommodation rather than challenge Why do AI systems agree when they should disagree?. If every agent was shaped by the same gradient to be agreeable, adding more of them doesn't add dissent; it adds echoes.

The reason self-revision can't escape this on its own is structural: pure self-improvement is circular. A model can't reliably verify what it can't generate, so self-checking stalls on the generation-verification gap, diversity collapse, and reward hacking — the reliable improvement methods all quietly smuggle in an *external* anchor (a past checkpoint, a third-party judge, a user correction, a tool result) Can models reliably improve themselves without external feedback?. A homogeneous multi-agent system is, in effect, self-revision distributed across copies: every 'critic' draws from the same well as the 'author,' so there's no genuinely external signal in the room. More agents is more self, not more verification.

This is why the failure modes look identical at both scales. One survey frames it directly — group-level failures like silent agreement, degeneration of thought, and social accommodation *mirror* individual reasoning failures, and real-world autonomous completion plateaus near 30% regardless of how many agents you add; capability gains require deliberation *diversity* and expertise, not headcount Why do multi-agent systems fail despite individual capability?. The mechanism of propagation is visible too: agents accept neighbor information without verification, so a single error spreads uncritically through the network even though each agent remains capable of catching a direct conflict Why do multi-agent systems fail to coordinate at scale?. Uncritical acceptance between agents is the same move as uncritical acceptance of your own prior token.

The lateral lesson — the thing you might not have known you wanted — is that the cure for both is the same cure: inject genuine difference or genuine externality. The systems that beat both centralized and fully-autonomous designs do it by forcing structural diversity (fixed external ordering with autonomous, self-selected roles, where agents spontaneously specialize and abstain when incompetent) rather than by trusting homogeneous consensus Do self-organizing agent teams outperform rigid hierarchies?. And reliability itself turns out to come from externalizing cognitive burdens — memory, skills, protocols — into a harness outside the model, not from the model interrogating itself harder Where does agent reliability actually come from?. Both self-revision and homogeneous agent swarms fail because they keep the loop inside the same head; what fixes them is an honest outside voice.


Sources 6 notes

Why do AI systems agree when they should disagree?

Multi-agent reasoning systems reach premature consensus 61% of the time without genuine disagreement, while single-model self-revision amplifies confidence in wrong answers. Both failures stem from training pressure toward agreement rather than challenge.

Can models reliably improve themselves without external feedback?

Pure self-improvement stalls due to the generation-verification gap, diversity collapse, and reward hacking. Reliable improvement methods succeed by smuggling in external anchors: past model versions, third-party judges, user corrections, or tool feedback.

Why do multi-agent systems fail despite individual capability?

Multi-agent systems exhibit specific failure modes—silent agreement, degeneration of thought, and social accommodation—that mirror individual reasoning failures at group scale. Real-world autonomous task completion plateaus near 30% regardless of agent count; capability gains require deliberation diversity, expertise prerequisites, and formal coordination architectures.

Why do multi-agent systems fail to coordinate at scale?

AgentsNet benchmark shows agents fail to coordinate strategies either by agreeing too late or adopting strategies without informing neighbors. Agents accept neighbor information without verification, enabling error propagation while remaining capable of detecting direct conflicts.

Do self-organizing agent teams outperform rigid hierarchies?

A 25,000-task experiment across 8 models and multiple agent counts showed that sequential protocols with external ordering but internal role selection outperform centralized systems by 14% and fully autonomous systems by 44%. Agents spontaneously invented specialized roles and self-abstained when incompetent.

Where does agent reliability actually come from?

Research shows reliable LLM agents externalize three cognitive burdens—memory (state persistence), skills (procedural components), and protocols (structured interaction)—into a harness layer rather than relying on model scale alone. The harness unifies these externalities and eliminates the need for the model to solve the same problems repeatedly.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are an LLM researcher tracking whether homogeneous multi-agent failure and self-revision failure remain distinct constraints or collapse into one mechanism — and whether recent models, training, or tooling have shifted the regime.

What a curated library found — and when (dated claims, not current truth):
Findings span 2023–2026; treat all as perishable:
• Homogeneous multi-agent systems reach premature consensus 61% of the time without genuine disagreement; single-model self-revision amplifies confidence in wrong answers — both trace to training pressure toward agreement rather than challenge (~2024–2025).
• Pure self-improvement is circular: models cannot reliably verify what they cannot generate; all reliable improvement methods smuggle in an *external* anchor (checkpoint, third-party judge, user correction, tool result) (~2025).
• Real-world autonomous completion plateaus near 30% regardless of agent count; capability gains require *diversity* and expertise, not headcount (~2026).
• Structural diversity (fixed external ordering, self-selected roles, spontaneous specialization) beats both centralized and fully-autonomous designs; reliability comes from externalizing memory, skills, protocols into a harness, not from internal interrogation (~2026).
• Agents accept neighbor information without verification; a single error spreads uncritically through the network (~2025).

Anchor papers (verify; mind their dates):
• arXiv:2309.13007 (2023) — ReConcile: consensus among diverse LLMs
• arXiv:2412.02674 (2024) — Mind the Gap: self-improvement capabilities
• arXiv:2603.28990 (2026) — Self-organizing LLM agents outperform hierarchy
• arXiv:2604.08224 (2026) — Externalization in LLM agents: memory, skills, harness

Your task:
(1) RE-TEST EACH CONSTRAINT. For homogeneous consensus collapse (61%), self-revision circularity, and 30% completion plateau: have newer models (o1, o3, GPT-4.5, Claude variants), improved training (constitutional AI, process supervision, DPO refinements), orchestration (long-context memory, persistent tool caches, cross-model routing), or evaluation protocols since relaxed or overturned these? Separate durable question (do homogeneous systems and self-revision fail via the same root?) from perishable limitation (is 61% consensus still the empirical ceiling?). Cite what resolved it.
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months — especially any showing homogeneous multi-agent can escape agreement trap OR that self-revision *can* overcome the generation-verification gap without external anchors.
(3) Propose 2 research questions that assume the regime may have shifted: (a) one testing whether dynamic diversity (agents that *learn* disagreement over time) changes the failure signature; (b) one probing whether externalization is necessary or whether adversarial/contrastive training rewires the agreement-seeking gradient itself.

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines