SYNTHESIS NOTE

Why do transformers need explicit chain-of-thought reasoning?

Explores whether chain-of-thought is a fundamental reasoning mechanism or a workaround for architectural limitations in how transformers track evolving state across computation steps.

Synthesis note · 2026-06-27 · sourced from Reasoning Architectures

The argument here is structural, not empirical, and it recasts a lot of the reasoning literature. State tracking — iteratively updating latent variables as an environment evolves, s_t = f(s_{t-1}, x_t) — is inherently sequential. A purely feedforward transformer cannot perform that update in place: with each new input step it must push the evolving state representation deeper into its layer stack, which renders earlier state inaccessible in shallow layers and eventually exhausts the model's finite depth. From this view, the entire apparatus of explicit chain-of-thought and latent "thinking" is not the mechanism of reasoning but a workaround — it externalizes state into the token stream because the architecture cannot hold it internally. The proposed fix is to refocus from explicit thought traces to implicit recurrent activation dynamics, with a taxonomy organized by recurrence axis (depth vs step) and the ratio of input tokens to recurrence steps.

This is the theoretical spine for the vault's recurrence cluster. How do looped language models actually improve reasoning in depth? gives the mechanistic picture of what depth-axis recurrence is doing; Can tiny recursive networks outperform massive language models? is the existence proof that recursion on latent state beats scale on exactly the state-heavy tasks this paper predicts will exhaust feedforward depth. Can looped transformers generalize to unseen knowledge combinations? supplies the "cannot" — a capability gap closed only by recurrence.

The counterargument the paper must answer is Can state-space models match transformers at copying and retrieval?: recurrent fixed-size state has its own provable ceiling on copying and retrieval. So the honest synthesis is not "recurrence beats attention" but a division of labor — attention's expanding context is right for retrieval, recurrence is right for state tracking, and conflating the two is what makes both CoT externalization and pure SSMs disappoint. The provocative line for writing: the field has been paying a token tax to simulate a state-update operation the hardware should perform natively.

Inquiring lines that use this note as a source 7

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 4

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
13 direct connections · 98 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

explicit chain-of-thought is an inefficient workaround for a topological deficiency — feedforward transformers cannot track evolving state without recurrence