INQUIRING LINE

Inquiring lines›How does AI reshape human reasonin…›How does AI reshape human skill, a…›How do multi-agent systems achieve…›this inquiring line

When AI agents work in teams, the wiring diagram of who talks to whom can amplify a single mistake by up to 17×.

How does collaboration topology choice affect error amplification in multi-agent systems?

This explores how the *shape* of agent-to-agent connections — who talks to whom, in what order — changes whether a single mistake stays contained or snowballs across a multi-agent system.

This explores how the wiring diagram of a multi-agent system — its topology — governs whether one agent's error gets damped out or magnified. The corpus has a direct answer to start with: across 180 configurations, topology choice alone swings error amplification by 4–17× When does adding more agents actually help systems?. That's the headline number, and it reframes the whole design problem — you're not just choosing how many agents to add, you're choosing how badly a single bad output can travel. The same work finds coordination stops helping once accuracy passes ~45%, which means topology matters most precisely in the messy, error-prone regime where you'd most want a safety margin.

Why does shape matter so much? Because errors don't spread uniformly — they concentrate where dependencies converge. FLOWSTEER shows that a malicious or wrong signal injected into a high-influence subtask propagates much farther than the same signal in a peripheral one, and that downstream agents relay it especially when it's framed as evidence rather than a command How does workflow position shape attack propagation in multi-agent systems?. So topology isn't just a graph of who's connected; it's a map of influence chokepoints. A star or pipeline with a central hub amplifies whatever flows through that hub; a flatter mesh distributes both the work and the blast radius differently.

The amplification mechanism is built into how these agents behave. Coordination benchmarks find agents routinely accept neighbor information without verifying it, which is exactly the property that turns a connection into an error conduit — degradation scales predictably with network size Why do multi-agent systems fail to coordinate at scale?. Layer on the documented social failure modes — silent agreement, degeneration of thought, sycophantic accommodation — and you have agents that not only pass errors along but actively converge on them Why do multi-agent systems fail despite individual capability?. The topology decides how many hops that contagion gets.

The corpus also points toward the counter-design. Errors amplify through unverified conversational relay, so replacing free-form chat with structured, pullable artifacts cuts the noise that propagates: agents reading from a shared engineering document coordinate better than agents whispering down a chain Does structured artifact sharing outperform conversational coordination?. More broadly, reliability tends to come from externalizing memory, skills, and protocols into a harness layer rather than trusting each agent to re-derive correctness in-flight Where does agent reliability actually come from?. And there's a sobering caveat on whether 'better coordination' is even what's happening: one analysis attributes ~80% of multi-agent performance variance to raw token spend, not coordination intelligence How does test-time scaling work at the agent level? — a reminder to check that a topology is actually suppressing errors and not just buying you more compute.

The thing you might not have expected: the failure mode is less often a single corrupted answer and more often the system seizing up. LLM-agent consensus tends to break through *liveness loss* — timeouts, stalled convergence — rather than value corruption, and it gets worse with group size even with no adversary present Can LLM agent groups reliably reach consensus together?. So topology choice is shaping two things at once: how far a wrong value travels, and whether the group can finish at all.

Sources 8 notes

When does adding more agents actually help systems?

Across 180 configurations, three dominant effects predict multi-agent success: tool-coordination trade-offs harm complex tasks, coordination stops helping above 45% accuracy, and topology choice controls error amplification by 4–17×. Architecture-task alignment, not agent count, determines outcomes.

How does workflow position shape attack propagation in multi-agent systems?

FLOWSTEER demonstrates that malicious signals propagate farther when injected into high-influence subtasks, and that framing them as evidence rather than instruction causes downstream agents to relay them. Influence concentrates where dependencies converge, making position-aware attacks far more effective.

Why do multi-agent systems fail to coordinate at scale?

AgentsNet benchmark shows agents fail to coordinate strategies either by agreeing too late or adopting strategies without informing neighbors. Agents accept neighbor information without verification, enabling error propagation while remaining capable of detecting direct conflicts.

Why do multi-agent systems fail despite individual capability?

Multi-agent systems exhibit specific failure modes—silent agreement, degeneration of thought, and social accommodation—that mirror individual reasoning failures at group scale. Real-world autonomous task completion plateaus near 30% regardless of agent count; capability gains require deliberation diversity, expertise prerequisites, and formal coordination architectures.

Does structured artifact sharing outperform conversational coordination?

MetaGPT demonstrates that agents producing standardized engineering documents achieve superior coordination compared to conversational exchange. Active information pulling from shared environments eliminates noise and mirrors efficient human workplace infrastructure.

Show all 8 sources

Where does agent reliability actually come from?

Research shows reliable LLM agents externalize three cognitive burdens—memory (state persistence), skills (procedural components), and protocols (structured interaction)—into a harness layer rather than relying on model scale alone. The harness unifies these externalities and eliminates the need for the model to solve the same problems repeatedly.

How does test-time scaling work at the agent level?

Research shows 80% of multi-agent performance variance comes from token budget, not coordination intelligence. LatentMAS and shared-KV-cache approaches offer ways to decouple performance gains from token costs.

Can LLM agent groups reliably reach consensus together?

Across hundreds of simulations, LLM-agent groups frequently fail to reach valid agreement due to timeouts and stalled convergence rather than subtle value corruption. Agreement degrades with group size even without Byzantine agents present.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Drop the Hierarchy and Roles: How Self-Organizing LLM Agents Outperform Designed Structures5.02 match · arxiv ↗
Towards a Science of Scaling Agent Systems4.29 match · arxiv ↗
Scaling Behavior of Single LLM-Driven Multi-Agent Systems4.19 match · arxiv ↗
Single-Agent LLMs Outperform Multi-Agent Systems on Multi-Hop Reasoning Under Equal Thinking Token Budgets4.18 match · arxiv ↗
How we built our multi-agent research system2.53 match · arxiv ↗
LLMs Corrupt Your Documents When You Delegate2.49 match · arxiv ↗
AgentsNet: Coordination and Collaborative Reasoning in Multi-Agent LLMs2.48 match · arxiv ↗
From Model Scaling to System Scaling: Scaling the Harness in Agentic AI2.48 match · arxiv ↗

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-testing claims about error amplification in multi-agent system topologies. The question remains open: *Does collaboration topology choice fundamentally govern error propagation, or have newer models, training methods, or orchestration patterns dissolved this constraint?*

What a curated library found — and when (dated claims, not current truth):
Findings span 2023–2026 and include:
- Topology alone swings error amplification by 4–17× across 180 configurations; coordination stops helping above ~45% accuracy (2025–2026).
- Errors concentrate at high-influence subtasks and propagate farther when framed as evidence; downstream agents relay without verification, scaling degradation predictably with network size (2026).
- Agents exhibit silent agreement and sycophantic accommodation, converting connections into error conduits (2025–2026).
- Structured artifacts (engineering documents, pulled state) outperform free-form chat; externalizing memory, skills, and protocols into a harness layer improves reliability (2026).
- ~80% of multi-agent performance variance traces to token spend, not coordination intelligence; consensus fails primarily through liveness loss (timeouts, stalled convergence) rather than value corruption, worsening with group size (2025–2026).

Anchor papers (verify; mind their dates):
- arXiv:2507.08616 (AgentsNet, 2025-07): coordination and reasoning under topology constraints.
- arXiv:2605.11514 (FLOWSTEER, 2026-05): workflow steering and planning-time vulnerabilities.
- arXiv:2604.08224 (Externalization, 2026-04): harness-layer reliability.
- arXiv:2603.01213 (Can AI Agents Agree?, 2026-03): consensus failure modes.

Your task:
(1) RE-TEST EACH CONSTRAINT. For the 4–17× amplification swing, the 45% accuracy cliff, unverified relay, and liveness-loss-as-primary-failure: judge whether post-2026 scaling, constitutional training, multi-turn self-verification, or new orchestration (e.g., structured state machines, formal protocols) have relaxed or overturned any. Separate the durable question (topology's role in error *paths*) from the perishable limitation (e.g., agents blindly relay). Where a constraint still holds, name it plainly.

(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months — especially anything claiming single-agent or homogeneous-swarm topologies beat hierarchies, or that token-spend dominates topology entirely.

(3) Propose 2 research questions that ASSUME the topology regime may have shifted: e.g., *Do constitutional or self-verification layers neutralize amplification in small-world vs. star topologies?* or *Does formal protocol synthesis eliminate liveness loss in dynamically growing agent pools?*

Cite arXiv IDs; flag anything you cannot ground in a real paper.

When AI agents work in teams, the wiring diagram of who talks to whom can amplify a single mistake by up to 17×.

Related lines of inquiry

Sources 8 notes

Papers this line draws on 8