INQUIRING LINE

What four decisions matter most in multi-agent system routing?

This explores the specific routing choices a multi-agent system has to make — and the corpus has a direct four-part answer, plus a lot of adjacent material on why each of those four matters.


This reads as a question about what a router in a multi-agent system actually decides when it sets up a team to solve a task. The cleanest answer comes from MasRouter, which argues these aren't separate problems to solve one at a time — they're four decisions that have to be optimized together: the collaboration topology (who talks to whom), the number of agents, the role each agent plays, and which LLM gets assigned to each agent What decisions must multi-agent routing systems optimize simultaneously?. Doing all four jointly, rather than bolting a model-picker onto a fixed team shape, is what let it beat single-model routing on accuracy while cutting costs by nearly half. So the short version is: topology, count, roles, and model assignment.

What's interesting is that the rest of the corpus reads like a stress-test of why each of those four is load-bearing. Take topology: a study across 180 configurations found that topology choice alone controls how badly errors get amplified — by a factor of 4 to 17× — and that architecture-task fit, not raw agent count, decides whether coordination helps at all When does adding more agents actually help systems?. That's a direct warning about the first and second decisions: get the shape wrong and you don't just lose a little accuracy, you build an error megaphone. Topology even shapes security — malicious or sycophantic signals travel farther when injected at high-influence positions where dependencies converge How does workflow position shape attack propagation in multi-agent systems?.

The model-assignment decision turns out to be the one with the biggest cost lever, and the corpus pushes back on the instinct to use a big model everywhere. Small language models handle most of the repetitive, well-defined subtasks at 10–30× lower cost, which makes a mixed team — small models by default, large ones only where needed — the economically rational pattern Can small language models handle most agent tasks?. There's a sobering counterpoint, too: a lot of measured multi-agent performance turns out to be a function of how many tokens you spend, not how cleverly the agents coordinate How does test-time scaling work at the agent level?. So the model-assignment decision is partly a disguised budget decision.

Where the question gets genuinely subverted is the agent-count decision. Several notes suggest the bravest routing choice is sometimes to route to fewer agents — or one. Coordination stops helping above a certain accuracy threshold, and single agents outperform teams in many cases as base models get stronger, with formal failure types (bottlenecks, overwhelm, error propagation) explaining exactly when teams break down When do multi-agent systems actually outperform single agents?. Larger groups also fail to reach consensus more often, mostly by stalling out rather than being corrupted Can LLM agent groups reliably reach consensus together?, and coordination degrades predictably as the network scales Why do multi-agent systems fail to coordinate at scale?.

The thing worth taking away: the four MasRouter decisions are real, but the corpus reframes routing as a discipline of restraint. The cross-cutting move is to make capability and constraints first-class — matching agents by what they can do and what they cost, via versioned capability vectors, rather than wiring teams by hand Can semantic capability vectors replace manual agent routing?. Topology, count, roles, and models are the four dials — but every note here points the same direction: the default answer to 'add more coordination' should often be 'less.'


Sources 9 notes

What decisions must multi-agent routing systems optimize simultaneously?

MasRouter shows that routing in multi-agent systems must jointly optimize collaboration topology, agent count, role allocation, and per-agent LLM assignment through a cascaded controller. This unified approach surpasses single-model routing by 3.51% accuracy while cutting HumanEval costs by 49%.

When does adding more agents actually help systems?

Across 180 configurations, three dominant effects predict multi-agent success: tool-coordination trade-offs harm complex tasks, coordination stops helping above 45% accuracy, and topology choice controls error amplification by 4–17×. Architecture-task alignment, not agent count, determines outcomes.

How does workflow position shape attack propagation in multi-agent systems?

FLOWSTEER demonstrates that malicious signals propagate farther when injected into high-influence subtasks, and that framing them as evidence rather than instruction causes downstream agents to relay them. Influence concentrates where dependencies converge, making position-aware attacks far more effective.

Can small language models handle most agent tasks?

SLMs handle the repetitive, well-defined language tasks that constitute most agent work at 10–30× lower cost than LLMs, making heterogeneous architectures (SLMs by default, LLMs selective) the economically rational design pattern.

How does test-time scaling work at the agent level?

Research shows 80% of multi-agent performance variance comes from token budget, not coordination intelligence. LatentMAS and shared-KV-cache approaches offer ways to decouple performance gains from token costs.

When do multi-agent systems actually outperform single agents?

Empirical analysis shows MAS performance gaps narrow with stronger models, with SAS outperforming in many cases. Three formal defect types—node-level bottlenecks, edge-level overwhelm, and path-level error propagation—explain when single agents win.

Can LLM agent groups reliably reach consensus together?

Across hundreds of simulations, LLM-agent groups frequently fail to reach valid agreement due to timeouts and stalled convergence rather than subtle value corruption. Agreement degrades with group size even without Byzantine agents present.

Why do multi-agent systems fail to coordinate at scale?

AgentsNet benchmark shows agents fail to coordinate strategies either by agreeing too late or adopting strategies without informing neighbors. Agents accept neighbor information without verification, enabling error propagation while remaining capable of detecting direct conflicts.

Can semantic capability vectors replace manual agent routing?

Versioned Capability Vectors embedded in HNSW indices couple semantic matching with policy and budget constraints, making capability discovery a first-class operation that scales sub-linearly as agent heterogeneity increases.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-testing claims about multi-agent system routing decisions. The question remains: which four decisions matter most when routing tasks across agent teams?

What a curated library found — and when (dated claims, not current truth):
Findings span 2025–2026. A curated library identified four load-bearing routing decisions optimized jointly — collaboration topology, agent count, role assignment, and LLM model selection (MasRouter, ~2025). Topology alone can amplify errors by 4–17× depending on task fit, not raw count (~2025). Small models handle 80–90% of agentic subtasks at 10–30× lower cost; mixed-team routing (small by default, large only where needed) is the economically rational pattern (~2025–2026). Performance often correlates with total tokens spent, not coordination cleverness (~2026). Single-agent systems increasingly outperform multi-agent teams as base models strengthen; coordination stops helping above accuracy thresholds, with formal failure modes (bottlenecks, error propagation) explaining when teams degrade (~2025–2026). Larger consensus groups stall rather than corrupt; coordination degrades predictably with network scale (~2026). Capability-driven routing via versioned capability vectors replaces hand-wired teams (~2025).

Anchor papers (verify; mind their dates):
- MasRouter (arXiv:2502.11133, 2025)
- Single-agent or Multi-agent Systems? Why Not Both? (arXiv:2505.18286, 2025)
- Towards a Science of Scaling Agent Systems (arXiv:2512.08296, 2025)
- Single-Agent LLMs Outperform Multi-Agent Systems on Multi-Hop Reasoning (arXiv:2604.02460, 2026)

Your task:
(1) RE-TEST EACH CONSTRAINT. For each of topology, count, role, and model-assignment decisions, ask: have newer models (o1, o3, Claude 4), training methods (mixture-of-experts routing, dynamic role learning), orchestration frameworks (memory banks, persistent agent pools, caching), or evaluation harnesses (standardized benchmarks for routing trade-offs) since relaxed or overturned the limits? Separate the durable question — which four *types* of decisions interact? — from perishable limits: does error amplification still scale 4–17×? Does single-agent now dominate *all* reasoning tasks, or only some? Does the 10–30× cost gap still hold? Flag what's likely still true and what may have shifted.

(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months. Has anyone shown that the four-decision framework is incomplete, or that a different factorization (e.g., input-splitting before routing, or post-hoc ensemble voting) outperforms joint optimization?

(3) Propose 2 research questions that ASSUME the regime may have moved: e.g., what happens to topology importance if agents can dynamically rewrite their own roles mid-task? If we route *within* a single forward pass using MoE, do the four decisions collapse into one?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines