INQUIRING LINE

Inquiring lines›How does AI reshape human reasonin…›How does AI reshape human skill, a…›How do multi-agent systems achieve…›this inquiring line

Separate teams building AI agents keep rediscovering the same three efficiency rules — is that convergence a clue about something deeper?

Is agentic efficiency analogous to convergent evolution in biology?

This explores whether the way efficiency techniques independently arrive at the same solutions across different agent components mirrors how unrelated species evolve similar traits under similar pressures.

This explores whether agentic efficiency is a case of convergent evolution — independent systems landing on the same design under shared pressure — rather than a set of clever, unrelated tricks. The corpus makes a surprisingly strong case for the analogy. The clearest evidence is that efficiency techniques for memory, tool use, and planning, developed by separate research communities, keep arriving at the *same* three principles: bound your context, minimize external calls, and control your search Do efficiency techniques across agent components reveal shared structural constraints?. In biology, when fins, wings, and streamlined bodies recur across unrelated lineages, we read that as evidence of a fundamental constraint (water, air, gravity) rather than coincidence. The same logic applies here: when independent optimizations converge, the convergence itself is the signal that something structural is forcing the outcome.

But the analogy gets more interesting when you notice that these three axes are *orthogonal* — improving memory compression does nothing for tool-learning efficiency or planning depth, because each has its own cost currency (tokens, latency, steps) Does agent efficiency really break down into three distinct components?. That's the convergent-evolution paradox in miniature: the *principles* converge (everything wants to be cheaper and tighter) while the *organs* stay distinct (an eye and a wing solve different problems). The pressure is shared; the adaptations are not interchangeable.

Where the corpus pushes the analogy further is in the literal use of evolution as a mechanism, not just a metaphor. Mind Evolution runs genetic algorithms — LLM-generated mutations and crossovers, an island model to preserve diversity — and beats both Best-of-N sampling and sequential revision, precisely because a single refinement trajectory collapses into premature convergence the way an inbred population loses fitness Can evolutionary search beat sampling and revision at inference time?. So the biological framing isn't decoration; population diversity, selection pressure, and convergence-vs-collapse are doing real explanatory work. You can even see selection pressure producing cooperation: agents trained against diverse co-players develop best-response strategies that resolve into cooperation through mutual vulnerability, no hardcoding required — an emergent trait under environmental pressure, exactly the shape of an evolved behavior Can agents learn cooperation by adapting to diverse partners?.

Where the analogy frays is around the source of improvement. Biological evolution needs no external designer — the environment is the only judge. But pure self-improvement in agents stalls: the generation-verification gap, diversity collapse, and reward hacking mean reliable gains always smuggle in an *external* anchor — a past model version, a third-party judge, user corrections, tool feedback Can models reliably improve themselves without external feedback?. That external signal is the fitness function the agent can't supply for itself, and Reflexion shows the cleanest version: agents improve only when the environment hands them an unambiguous success/failure signal they can't rationalize away Can agents learn from failure without updating their weights?. So the better biological reading may be artificial selection — a breeder choosing which variants survive — rather than blind natural selection.

The payoff for a curious reader: efficiency in agents may not be a grab-bag of optimizations but a set of attractors that any sufficiently pressured system falls into, which is why the same scaling curve governs both reasoning tokens and search budget How does test-time scaling work for individual research agents?. If that's right, the practical move isn't to invent new tricks but to expect the constraints to keep reproducing the same solutions — and to remember that, unlike nature, these systems still need someone outside holding the fitness function.

Sources 7 notes

Do efficiency techniques across agent components reveal shared structural constraints?

Techniques for memory, tool learning, and planning independently converge on shared principles: context bounding, minimizing external calls, and controlled search. This convergence suggests these reflect fundamental structural pressures in agentic computation rather than component-specific optimizations.

Does agent efficiency really break down into three distinct components?

Research identifies memory compression, tool learning efficiency, and planning optimization as three structurally independent components, each with distinct cost profiles (tokens, latency, and steps). Improving one axis does not automatically improve the others, requiring holistic design.

Can evolutionary search beat sampling and revision at inference time?

Mind Evolution uses genetic algorithms with LLM-generated mutations and crossovers to significantly outperform Best-of-N and Sequential Revision on planning benchmarks. An island model sustains population diversity, preventing the premature convergence that single-trajectory refinement exhibits.

Can agents learn cooperation by adapting to diverse partners?

Sequence model agents trained against diverse co-players develop in-context best-response strategies that naturally resolve into cooperation. Mutual vulnerability to exploitation creates pressure that drives cooperative mutual adaptation without hardcoded assumptions or timescale separation.

Can models reliably improve themselves without external feedback?

Pure self-improvement stalls due to the generation-verification gap, diversity collapse, and reward hacking. Reliable improvement methods succeed by smuggling in external anchors: past model versions, third-party judges, user corrections, or tool feedback.

Show all 7 sources

Can agents learn from failure without updating their weights?

Reflexion demonstrates that unambiguous environmental feedback (success/failure) enables agents to write useful self-diagnoses and improve across episodes without parameter updates. The binary signal prevents rationalization, and keeping reflections uncompressed preserves their usability.

How does test-time scaling work for individual research agents?

Research shows that deep research agents exhibit test-time scaling laws where search steps scale similarly to reasoning tokens, and live search outperforms memorized retrieval on knowledge-intensive tasks. Data efficiency is extreme—78 curated demonstrations outperform 10K samples for agency.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Toward Efficient Agents: A Survey of Memory, Tool Learning, and Planning1.70 match · arxiv ↗
SkillClaw: Let Skills Evolve Collectively with Agentic Evolver1.67 match · arxiv ↗
Towards a Science of Scaling Agent Systems1.66 match · arxiv ↗
LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries1.64 match · arxiv ↗
Single-Agent LLMs Outperform Multi-Agent Systems on Multi-Hop Reasoning Under Equal Thinking Token Budgets1.63 match · arxiv ↗
From Model Scaling to System Scaling: Scaling the Harness in Agentic AI1.62 match · arxiv ↗
How we built our multi-agent research system1.62 match · arxiv ↗
Evolving Deeper LLM Thinking0.92 match · arxiv ↗

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-testing whether agentic efficiency exhibits convergent evolution. The question remains open: do independent agent optimizations converge on the same principles because of shared structural constraints, or are they isolated tricks that happen to cluster?

What a curated library found — and when (dated claims, not current truth):
Findings span 2023–2026. The library reports:
• Three orthogonal efficiency axes (memory compression, tool-learning latency, planning depth) recur independently across agent architectures, suggesting structural convergence rather than coincidence (~2026).
• Evolutionary search at inference time (mutation, crossover, island models) beats Best-of-N and sequential revision by preserving population diversity, implying biological framing does real explanatory work (~2025).
• Pure self-improvement stalls due to generation-verification gaps and reward hacking; reliable gains always require external anchors (past model, third-party judge, environment feedback) — an artificial-selection regime, not blind evolution (~2024–2025).
• Reflexion demonstrates agents improve only when environments provide unambiguous success/failure signals they cannot rationalize away (~2024).
• Scaling curves for reasoning tokens and search budget follow the same trajectory, suggesting attractors force convergence (~2025–2026).

Anchor papers (verify; mind their dates):
• arXiv:2601.14192 (2026-01): Toward Efficient Agents: A Survey of Memory, Tool Learning, and Planning
• arXiv:2501.09891 (2025-01): Evolving Deeper LLM Thinking
• arXiv:2412.02674 (2024-12): Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models
• arXiv:2602.16301 (2026-02): Multi-agent cooperation through in-context co-player inference

Your task:
(1) RE-TEST EACH CONSTRAINT. For the claim that self-improvement requires external anchors, has newer tooling (SDKs, evaluator APIs, multi-agent orchestration), training methods (preference learning, constitutional AI), or larger inference budgets since 2024–2025 enabled agents to *internalize* a fitness function or bootstrap their own? Separately: has evolutionary inference actually scaled beyond toy domains, or does it collapse in cost/diversity trade-off? Isolate what still constrains convergence from what may now be relaxed.
(2) Surface the strongest work from the last ~6 months that *contradicts* the biological analogy—either by showing agentic efficiency is NOT constraint-driven, or that agents can self-improve without external signals.
(3) Propose 2 research questions assuming the regime has shifted: e.g., "If agents can now maintain internal reward models, does the convergence pattern dissolve or sharpen?" and "Do multi-agent systems exhibit speciation (divergent efficiency strategies) or convergent optimization?"

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Separate teams building AI agents keep rediscovering the same three efficiency rules — is that convergence a clue about something deeper?

Related lines of inquiry

Sources 7 notes

Papers this line draws on 8