SYNTHESIS NOTE
Training, RL, and Test-Time Scaling Agentic Systems and Tool Use Model Architecture and Internals

Can language models discover new expertise through collaborative weight search?

Can model experts be composed through particle swarm optimization in weight space without training? This explores whether collaborative search can discover capabilities that no individual expert possesses.

Synthesis note · 2026-02-23 · sourced from Agents Multi Architecture

Model composition has two dominant approaches: learn-to-fuse (train components to glue experts together — data-heavy, rigid) and model arithmetic (weight operations with strong assumptions like lion_indoors = lion_outdoors + dog_indoors - dog_outdoors — assumption-heavy, manual). MODEL SWARMS proposes a third way: collaborative search in weight space inspired by particle swarm optimization.

Each LLM expert is a particle with a location (model weights) and velocity (direction in weight space). Velocity is iteratively updated by three forces: inertia (tendency to keep moving), personal best (the best location this particle has found), and global best/worst (the best/worst locations found across all particles). Particles then step toward their updated velocity.

Three key properties make this distinctive:

1. Training-free. No loss function, gradient descent, or backpropagation. Composition requires only 200 examples as validation signal — barely 3 batches for training-based approaches.

2. Assumption-free. No manual specification of how experts should compose. The swarm automatically discovers better adapted experts through collaborative search.

3. Any adaptation objective. The utility function can be anything — dataset performance, reward model scores, human interests. This flexibility is structural, not parameter-tuned.

The most interesting finding is correctness emergence: new capabilities appear that no initial expert had. Questions where all experts initially answered incorrectly are answered correctly by post-swarm experts. This is not transfer — it is genuinely new capability discovered through search in weight space.

Practical results: 17.6% improvement in LLM-as-judge scores, 17.0% in factuality, 70.8% human win rate against initial experts (96% on best domains). MODEL SWARMS also drastically reduces sensitivity to minor prompt changes — improving robustness through weight-space optimization rather than prompt engineering.

Token swarms extend the approach to cross-architecture composition by operating on token probability distributions rather than weights.

Inquiring lines that use this note as a source 21

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 4

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
14 direct connections · 166 in 2-hop network ·dense cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

swarm intelligence in weight space discovers adapted language model experts without training through collaborative search guided by utility functions