SYNTHESIS NOTE
Agentic Systems and Tool Use

Why does agent efficiency differ from model size reduction?

Explores why making models smaller doesn't solve agent cost problems. Agents loop recursively, compounding costs multiplicatively, so efficiency requires system-level design, not just parameter reduction.

Synthesis note · 2026-05-18 · sourced from Agents

A definitional point from Toward Efficient Agents that resolves a common confusion. "Efficient" in the LLM context has typically meant "smaller model" — distillation, quantization, sparser attention, anything that reduces per-token inference cost. For agentic systems, this is the wrong frame.

The reason is structural. A standard LLM in single-turn query-response operates linearly: input goes in, output comes out, cost is proportional to context plus output length. An agent operates recursively: it queries the model, observes the response, decides on actions, executes tools, reads results, queries the model again, and so on. The compound cost across this loop grows multiplicatively in the number of steps, often quadratically or worse if context accumulates per turn. A 7B-parameter model running an agent loop for 50 steps consumes far more than 50 times the resources of a 7B-parameter model answering one question.

This makes "smaller model" a marginal optimization for agentic systems. Halving the model size halves per-call cost but does not address the multi-step accumulation. A truly efficient agent has to be optimized at the system level — what triggers the recursion, when does it stop, how much state does each turn carry forward, how much can be pruned at each step.

The right metric is not "throughput per token" but the Pareto frontier between effectiveness (task success rate) and cost (latency + tokens + tool invocations + dollar cost). An agent that completes the task in 5 steps with a larger model can be more efficient than one that completes it in 50 steps with a smaller model. The model size is a knob, not the answer.

For deployment, this argues against the reflexive "downsize the model" approach to agentic-system cost reduction. The right intervention is usually structural — reduce steps, compress memory, eliminate unnecessary tool calls, plan better. Model size cuts come last and offer the least leverage for the cost they impose on capability.

Inquiring lines that use this note as a source 3

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 4

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
14 direct connections · 111 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

efficient agent is system-level optimization for the success-versus-cost Pareto frontier — distinct from smaller model because agent recursion consumes resources exponentially beyond single-turn use