SYNTHESIS NOTE
Reasoning, Retrieval, and Evaluation Training, RL, and Test-Time Scaling Agentic Systems and Tool Use

Do tools actually expand what language models can reason about?

Explores whether tool access fundamentally breaks through reasoning limits in pure-text models, or merely optimizes existing capabilities. Understanding this distinction clarifies whether tools are luxury features or necessity for genuine capability growth.

Synthesis note · 2026-06-03 · sourced from Reinforcement Learning

Tool-Integrated Reasoning (TIR) — letting a model call a Python interpreter or other external tool mid-reasoning — reliably outperforms pure-text reasoning, but the field has demonstrated this empirically without a principled account of why and when it helps. This paper proves it: TIR enables a strict expansion of both the model's empirical and feasible support, breaking the "invisible leash" that constrains pure-text models. Tools make complex algorithmic strategies practically achievable within finite token budgets — strategies that are otherwise impossible or intractably verbose to express in text alone. Crucially, the advantage is not confined to compute-heavy arithmetic; it extends to problems requiring abstract insight.

On the training side, the paper identifies that reward shaping for TIR is unstable and proposes Advantage Shaping Policy Optimization (ASPO), which directly modifies the advantage function rather than the reward to guide behavior without destabilizing training.

This is the reasoning-side companion to Can models store unlimited facts without growing larger?: one proof concerns factual capacity, this one concerns reasoning reach. Together they give a formal foundation for why agentic harnesses beat bigger models — and they sharpen Does the reasoning cliff depend on how we test models?, which observed empirically that tool access dissolves apparent reasoning ceilings.

Inquiring lines that use this note as a source 6

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
13 direct connections · 109 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

tool-integrated reasoning provably expands an LLM capability frontier — tools unlock strategies impossible or intractably verbose in pure text