What four decisions matter most in multi-agent system routing?
This explores the specific routing choices a multi-agent system has to make — and the corpus has a direct four-part answer, plus a lot of adjacent material on why each of those four matters.
This reads as a question about what a router in a multi-agent system actually decides when it sets up a team to solve a task. The cleanest answer comes from MasRouter, which argues these aren't separate problems to solve one at a time — they're four decisions that have to be optimized together: the collaboration topology (who talks to whom), the number of agents, the role each agent plays, and which LLM gets assigned to each agent What decisions must multi-agent routing systems optimize simultaneously?. Doing all four jointly, rather than bolting a model-picker onto a fixed team shape, is what let it beat single-model routing on accuracy while cutting costs by nearly half. So the short version is: topology, count, roles, and model assignment.
What's interesting is that the rest of the corpus reads like a stress-test of why each of those four is load-bearing. Take topology: a study across 180 configurations found that topology choice alone controls how badly errors get amplified — by a factor of 4 to 17× — and that architecture-task fit, not raw agent count, decides whether coordination helps at all When does adding more agents actually help systems?. That's a direct warning about the first and second decisions: get the shape wrong and you don't just lose a little accuracy, you build an error megaphone. Topology even shapes security — malicious or sycophantic signals travel farther when injected at high-influence positions where dependencies converge How does workflow position shape attack propagation in multi-agent systems?.
The model-assignment decision turns out to be the one with the biggest cost lever, and the corpus pushes back on the instinct to use a big model everywhere. Small language models handle most of the repetitive, well-defined subtasks at 10–30× lower cost, which makes a mixed team — small models by default, large ones only where needed — the economically rational pattern Can small language models handle most agent tasks?. There's a sobering counterpoint, too: a lot of measured multi-agent performance turns out to be a function of how many tokens you spend, not how cleverly the agents coordinate How does test-time scaling work at the agent level?. So the model-assignment decision is partly a disguised budget decision.
Where the question gets genuinely subverted is the agent-count decision. Several notes suggest the bravest routing choice is sometimes to route to fewer agents — or one. Coordination stops helping above a certain accuracy threshold, and single agents outperform teams in many cases as base models get stronger, with formal failure types (bottlenecks, overwhelm, error propagation) explaining exactly when teams break down When do multi-agent systems actually outperform single agents?. Larger groups also fail to reach consensus more often, mostly by stalling out rather than being corrupted Can LLM agent groups reliably reach consensus together?, and coordination degrades predictably as the network scales Why do multi-agent systems fail to coordinate at scale?.
The thing worth taking away: the four MasRouter decisions are real, but the corpus reframes routing as a discipline of restraint. The cross-cutting move is to make capability and constraints first-class — matching agents by what they can do and what they cost, via versioned capability vectors, rather than wiring teams by hand Can semantic capability vectors replace manual agent routing?. Topology, count, roles, and models are the four dials — but every note here points the same direction: the default answer to 'add more coordination' should often be 'less.'
Sources 9 notes
MasRouter shows that routing in multi-agent systems must jointly optimize collaboration topology, agent count, role allocation, and per-agent LLM assignment through a cascaded controller. This unified approach surpasses single-model routing by 3.51% accuracy while cutting HumanEval costs by 49%.
Across 180 configurations, three dominant effects predict multi-agent success: tool-coordination trade-offs harm complex tasks, coordination stops helping above 45% accuracy, and topology choice controls error amplification by 4–17×. Architecture-task alignment, not agent count, determines outcomes.
FLOWSTEER demonstrates that malicious signals propagate farther when injected into high-influence subtasks, and that framing them as evidence rather than instruction causes downstream agents to relay them. Influence concentrates where dependencies converge, making position-aware attacks far more effective.
SLMs handle the repetitive, well-defined language tasks that constitute most agent work at 10–30× lower cost than LLMs, making heterogeneous architectures (SLMs by default, LLMs selective) the economically rational design pattern.
Research shows 80% of multi-agent performance variance comes from token budget, not coordination intelligence. LatentMAS and shared-KV-cache approaches offer ways to decouple performance gains from token costs.
Empirical analysis shows MAS performance gaps narrow with stronger models, with SAS outperforming in many cases. Three formal defect types—node-level bottlenecks, edge-level overwhelm, and path-level error propagation—explain when single agents win.
Across hundreds of simulations, LLM-agent groups frequently fail to reach valid agreement due to timeouts and stalled convergence rather than subtle value corruption. Agreement degrades with group size even without Byzantine agents present.
AgentsNet benchmark shows agents fail to coordinate strategies either by agreeing too late or adopting strategies without informing neighbors. Agents accept neighbor information without verification, enabling error propagation while remaining capable of detecting direct conflicts.
Versioned Capability Vectors embedded in HNSW indices couple semantic matching with policy and budget constraints, making capability discovery a first-class operation that scales sub-linearly as agent heterogeneity increases.