INQUIRING LINE

Inquiring lines›How do language models construct a…›How does AI persuasion undermine h…›Which computational strategies bes…›this inquiring line

A flock of AI models can discover a blended expert none of them could be individually — no training required.

How many particles and iterations does optimal expert discovery require?

This explores the swarm-search method where LLMs act as 'particles' drifting through weight space to discover composed experts — and what the actual compute budget (swarm size, iterations) for that discovery looks like.

This reads the question as being about a specific, almost playful idea in the corpus: treating language models as a swarm of particles that move through weight space, borrowing the physics of Particle Swarm Optimization, until they settle on a blended 'expert' that none of the starting models could be alone. The headline result in Can language models discover new expertise through collaborative weight search? is that this works with no gradient training at all and only ~200 validation examples to steer the swarm — the experts even answer questions every initial model got wrong. So the honest answer to 'how many particles and iterations' is: surprisingly few, because the method substitutes cheap evaluation signal for expensive backprop. The interesting part isn't the exact number — it's that discovery here is a search problem, not a training problem.

That reframing connects laterally to a quieter finding in the corpus about *how* search budgets behave. Do search steps follow the same scaling rules as reasoning tokens? shows that adding more search steps follows the same diminishing-returns curve as adding more reasoning tokens — which suggests there's no magic iteration count, just a knee in the curve past which extra particles or rounds buy you little. If you want the swarm to find an optimum efficiently, the question becomes where that knee sits, not how high you can crank the budget.

There's also a sharper way to think about *what kind* of search converges fast. Can neural networks explore efficiently at recommendation scale? makes the case that exploration is cheap when you spend compute only on the uncertainty that actually matters (epistemic, not noise) — it hit its targets with 29% fewer interactions by being selective. That's the same intuition behind why 200 examples can guide a weight-space swarm: a well-targeted signal collapses the search space faster than brute iteration. And Can an AI system improve its own search methods automatically? goes one level further — instead of fixing the number of particles and iterations, it lets an outer loop *rewrite the search mechanism itself*, discovering bandit and combinatorial methods that beat hand-tuned settings by 5x. The optimal budget, in other words, may be something you discover rather than specify.

One worth knowing for contrast: the corpus has a cautionary note in Do large language models actually perform iterative optimization?, which shows LLMs *can't* actually run iterative numerical optimization internally — they pattern-match plausible answers instead. That's why swarm methods like the one above run the iteration *externally*, over a population of real model evaluations, rather than asking a single model to 'optimize in its head.' The particles do the iterating; the models just get evaluated.

If you want the most surprising takeaway: optimal expert discovery doesn't really have a fixed particle count or iteration budget — the frontier work treats both as things to *search for* (via meta-optimization) or *taper off* (via scaling-law knees), and the leverage comes from spending your evaluation budget where uncertainty is highest, not from running the swarm longer.

Sources 5 notes

Can language models discover new expertise through collaborative weight search?

PSO-inspired swarms of LLM particles moving through weight space discover composed experts with new capabilities—including answering questions all initial experts failed on—using only 200 validation examples and no gradient-based training.

Do search steps follow the same scaling rules as reasoning tokens?

Deep research agents improve with more search steps in a pattern mirroring the reasoning-token relationship, with both exhibiting diminishing returns. This reveals a new inference-compute axis beyond model capability alone.

Can neural networks explore efficiently at recommendation scale?

ENR separates aleatoric from epistemic uncertainty, focusing computation only on parameter uncertainty needed for Thompson sampling. It improved click-through rates 9% and ratings 6% while requiring 29% fewer interactions than baselines.

Can an AI system improve its own search methods automatically?

An outer loop successfully read inner loop code, identified bottlenecks, and generated new Python mechanisms at runtime, discovering combinatorial optimization and bandit methods that broke the inner loop's deterministic patterns and improved performance on GPT pretraining by 5x.

Do large language models actually perform iterative optimization?

Research shows LLMs cannot perform iterative procedures in latent space. They recognize optimization problems as template-similar and emit plausible-looking but incorrect values, a failure mode that persists across model scale and training approaches.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Bilevel Autoresearch: Meta-Autoresearching Itself0.92 match · arxiv ↗
From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning Agents0.88 match · arxiv ↗
Scalable Neural Contextual Bandit for Recommender Systems0.87 match · arxiv ↗
Model Swarms: Collaborative Search to Adapt LLM Experts via Swarm Intelligence0.87 match · arxiv ↗
When More Thinking Hurts: Overthinking in LLM Test-Time Compute Scaling0.87 match · arxiv ↗
Reasoning Models Can Be Effective Without Thinking0.86 match · arxiv ↗
ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning0.85 match · arxiv ↗
Large Language Models Think Too Fast To Explore Effectively0.85 match · arxiv ↗

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-testing claims about particle swarm optimization in weight space for expert discovery. The question remains: how many particles and iterations does optimal expert discovery require?

What a curated library found — and when (findings span 2023–2026, but treat as dated claims):
• Swarm-based expert discovery works with ~200 validation examples and no gradient training; particles converge to blended experts that solve problems no single model could solve alone (~2024-10, arXiv:2410.11163).
• Search budget scaling follows the same diminishing-returns curve as reasoning-token scaling — no magic iteration count, just a knee past which extra particles yield little gain (~2024-10, inferred from search-budget-law work).
• Optimal search is selective: targeting epistemic uncertainty (not noise) reduced interaction count by 29% vs. brute iteration (~2023-06, arXiv:2306.14834).
• Meta-optimization (outer loop rewriting the search mechanism itself) outperforms hand-tuned particle/iteration settings by ~5x; the optimal budget may be *discovered*, not fixed (~2026-03, arXiv:2603.23420).
• LLMs cannot execute iterative numerical optimization internally — they pattern-match; swarms must run iteration *externally* over real evaluations (~citation needed: LLMs-cannot-execute work).

Anchor papers (verify; mind their dates):
• arXiv:2410.11163 (2024-10): Model Swarms — core swarm-in-weight-space paper.
• arXiv:2306.14834 (2023-06): Scalable Neural Contextual Bandit — epistemic-uncertainty efficiency.
• arXiv:2603.23420 (2026-03): Bilevel Autoresearch — meta-optimization of search budgets.
• arXiv:2501.06252 (2025-01): Transformer2 — self-adaptive LLMs (check relevance to discovery regime).

Your task:
(1) RE-TEST EACH CONSTRAINT. For the ~200-example claim and the "no gradient" finding: have newer model scales, in-context learning, or orchestration (e.g., caching repeated evaluations) since *relaxed* the data or compute requirement? Has the "knee in the curve" been empirically mapped for any recent swarm variant? Separate the durable question (does swarm search beat training for expert blend?) from the perishable limit (does it still require ~200 examples?).
(2) Surface work from the last ~6 months that *contradicts* the "search-not-training" framing — e.g., do any recent papers show that fine-tuning a single model outperforms swarm discovery under realistic compute budgets?
(3) Propose two research questions assuming the regime has moved: (a) Can meta-optimization *dynamically* allocate particles mid-search based on convergence signals? (b) Do swarms discover the same expert in weight space as distillation from an ensemble?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

A flock of AI models can discover a blended expert none of them could be individually — no training required.

Related lines of inquiry

Sources 5 notes

Papers this line draws on 8