Can you compose independent LLM experts without synchronization overhead?
This explores whether you can train and combine specialized LLM 'experts' that never had to talk to each other during training — and whether the same idea of synchronization-free composition extends from training to inference and multi-agent coordination.
This reads the question as: can independent LLM experts be composed cheaply, without the synchronization that normally makes distributed systems expensive — and the corpus has a surprisingly direct answer at training time, with a more cautionary one at runtime. The cleanest 'yes' is Branch-Train-MiX Can asynchronous expert training beat synchronized distributed LLM training?, which trains domain experts fully in parallel with no cross-talk, then stitches their feed-forward layers together as mixture-of-experts modules and learns a lightweight router to pick between them per token. The synchronization overhead that usually dominates distributed training is simply skipped, and the merged model still beats synchronized training on the accuracy-efficiency tradeoff. The key move is that 'composition' is deferred: experts are independent until a router decides how to blend them.
What makes that work is a theme that runs through several other notes — composition is cheap when you isolate what each component needs to see. LLM Programs Can algorithms control LLM reasoning better than LLMs alone? make this explicit: wrap each model call in an algorithm that hands it only step-relevant context, and complex reasoning becomes a set of independent, debuggable sub-tasks. Decoupling reasoning from tool observations Can reasoning and tool execution be truly decoupled? does the same trick for agents — plan first, fill in tool results later — which removes the sequential, quadratic chatter between reasoning and execution. In all three cases the 'overhead' being eliminated is the same thing: forcing every part to stay in lockstep with every other part.
The runtime story is more striking still. Parallel LLM workers sharing a concurrent KV cache Can multiple LLMs coordinate without explicit collaboration rules? coordinate — split plans, notice redundant work, adapt — without any coordination protocol or fine-tuning at all. Reasoning models seem to already carry the machinery for collaboration; give them a shared scratchpad and they self-organize. That suggests composition without synchronization isn't only an offline training trick; capable models can negotiate their own light coordination on the fly.
But the corpus also marks where this breaks. AgentsNet Why do multi-agent systems fail to coordinate at scale? shows that once you scale to many agents passing messages, coordination degrades predictably: agents commit too late, or adopt strategies without telling neighbors, and they accept incoming information without verifying it — so errors propagate. And the DELEGATE work Do frontier LLMs silently corrupt documents in long workflows? finds that long relay chains silently corrupt about a quarter of document content, with errors compounding instead of plateauing. The pattern is sharp: composing independent experts is nearly free when a router or shared cache makes the integration point small and verifiable, and it gets expensive — or quietly wrong — exactly when components must trust each other's outputs across many sequential hops.
So the honest answer is yes, with a boundary. You can compose independent experts without synchronization overhead when the merge happens at one well-defined seam — merged FFN layers with a learned router, a shared KV cache, a planner that allocates context. What you can't escape for free is the verification cost of chained, peer-to-peer hand-offs at scale. If you want to go deeper on why that ceiling exists, the note on self-improvement bounds What stops large language models from improving themselves? frames it generally: reliable composition keeps needing something external to validate it.
Sources 7 notes
Branch-Train-MiX trains domain experts in parallel without synchronization overhead, merges their feed-forward parameters as MoE experts, and learns token-level routing, achieving better accuracy-efficiency tradeoffs than synchronized training or routing-free merging.
LLM Programs embed LLMs within explicit algorithms that manage control flow and state, presenting only step-specific context to each LLM call. This information hiding addresses capability and context window limits while treating complex reasoning as modular, debuggable sub-tasks.
ReWOO and Chain-of-Abstraction both decouple reasoning from tool responses through different mechanisms—planning-before-execution and abstract placeholders respectively—eliminating quadratic prompt growth and sequential latency while maintaining reasoning quality.
Existing reasoning-capable models like QwQ and DeepSeek-R1 spontaneously formulate plans, detect redundancy, and adapt strategies when given shared access to a concurrent KV cache. This coordination emerges without fine-tuning, suggesting reasoning models already possess multi-agent collaboration capabilities.
AgentsNet benchmark shows agents fail to coordinate strategies either by agreeing too late or adopting strategies without informing neighbors. Agents accept neighbor information without verification, enabling error propagation while remaining capable of detecting direct conflicts.
Testing 19 models across 52 domains shows even advanced systems degrade documents by ~25% over extended relay tasks, with errors compounding silently without plateauing through 50 round-trips.
Self-improvement in LLMs is formally bounded by the generation-verification gap, meaning every reliable fix requires something external to validate and enforce it. Models cannot escape this constraint through metacognition alone.