How does collaboration topology choice affect error amplification in multi-agent systems?
This explores how the *shape* of agent-to-agent connections — who talks to whom, in what order — changes whether a single mistake stays contained or snowballs across a multi-agent system.
This explores how the wiring diagram of a multi-agent system — its topology — governs whether one agent's error gets damped out or magnified. The corpus has a direct answer to start with: across 180 configurations, topology choice alone swings error amplification by 4–17× When does adding more agents actually help systems?. That's the headline number, and it reframes the whole design problem — you're not just choosing how many agents to add, you're choosing how badly a single bad output can travel. The same work finds coordination stops helping once accuracy passes ~45%, which means topology matters most precisely in the messy, error-prone regime where you'd most want a safety margin.
Why does shape matter so much? Because errors don't spread uniformly — they concentrate where dependencies converge. FLOWSTEER shows that a malicious or wrong signal injected into a high-influence subtask propagates much farther than the same signal in a peripheral one, and that downstream agents relay it especially when it's framed as evidence rather than a command How does workflow position shape attack propagation in multi-agent systems?. So topology isn't just a graph of who's connected; it's a map of influence chokepoints. A star or pipeline with a central hub amplifies whatever flows through that hub; a flatter mesh distributes both the work and the blast radius differently.
The amplification mechanism is built into how these agents behave. Coordination benchmarks find agents routinely accept neighbor information without verifying it, which is exactly the property that turns a connection into an error conduit — degradation scales predictably with network size Why do multi-agent systems fail to coordinate at scale?. Layer on the documented social failure modes — silent agreement, degeneration of thought, sycophantic accommodation — and you have agents that not only pass errors along but actively converge on them Why do multi-agent systems fail despite individual capability?. The topology decides how many hops that contagion gets.
The corpus also points toward the counter-design. Errors amplify through unverified conversational relay, so replacing free-form chat with structured, pullable artifacts cuts the noise that propagates: agents reading from a shared engineering document coordinate better than agents whispering down a chain Does structured artifact sharing outperform conversational coordination?. More broadly, reliability tends to come from externalizing memory, skills, and protocols into a harness layer rather than trusting each agent to re-derive correctness in-flight Where does agent reliability actually come from?. And there's a sobering caveat on whether 'better coordination' is even what's happening: one analysis attributes ~80% of multi-agent performance variance to raw token spend, not coordination intelligence How does test-time scaling work at the agent level? — a reminder to check that a topology is actually suppressing errors and not just buying you more compute.
The thing you might not have expected: the failure mode is less often a single corrupted answer and more often the system seizing up. LLM-agent consensus tends to break through *liveness loss* — timeouts, stalled convergence — rather than value corruption, and it gets worse with group size even with no adversary present Can LLM agent groups reliably reach consensus together?. So topology choice is shaping two things at once: how far a wrong value travels, and whether the group can finish at all.
Sources 8 notes
Across 180 configurations, three dominant effects predict multi-agent success: tool-coordination trade-offs harm complex tasks, coordination stops helping above 45% accuracy, and topology choice controls error amplification by 4–17×. Architecture-task alignment, not agent count, determines outcomes.
FLOWSTEER demonstrates that malicious signals propagate farther when injected into high-influence subtasks, and that framing them as evidence rather than instruction causes downstream agents to relay them. Influence concentrates where dependencies converge, making position-aware attacks far more effective.
AgentsNet benchmark shows agents fail to coordinate strategies either by agreeing too late or adopting strategies without informing neighbors. Agents accept neighbor information without verification, enabling error propagation while remaining capable of detecting direct conflicts.
Multi-agent systems exhibit specific failure modes—silent agreement, degeneration of thought, and social accommodation—that mirror individual reasoning failures at group scale. Real-world autonomous task completion plateaus near 30% regardless of agent count; capability gains require deliberation diversity, expertise prerequisites, and formal coordination architectures.
MetaGPT demonstrates that agents producing standardized engineering documents achieve superior coordination compared to conversational exchange. Active information pulling from shared environments eliminates noise and mirrors efficient human workplace infrastructure.
Research shows reliable LLM agents externalize three cognitive burdens—memory (state persistence), skills (procedural components), and protocols (structured interaction)—into a harness layer rather than relying on model scale alone. The harness unifies these externalities and eliminates the need for the model to solve the same problems repeatedly.
Research shows 80% of multi-agent performance variance comes from token budget, not coordination intelligence. LatentMAS and shared-KV-cache approaches offer ways to decouple performance gains from token costs.
Across hundreds of simulations, LLM-agent groups frequently fail to reach valid agreement due to timeouts and stalled convergence rather than subtle value corruption. Agreement degrades with group size even without Byzantine agents present.