SYNTHESIS NOTE

Why does latent chain-of-thought fail so easily in training?

Explores why latent reasoning is fragile compared to textual chain-of-thought, focusing on how outcome-only supervision creates gradient starvation and representational drift in learned reasoning trajectories.

Synthesis note · 2026-06-27 · sourced from Cognitive Models Latent

Why is latent chain-of-thought so hard to train robustly when textual CoT is comparatively easy? This paper gives an information-theoretic answer: latent CoT fails by a dual collapse. Outcome-only supervision (reward the final answer, ignore the trajectory) produces (1) gradient attenuation along the optimization path — the signal is too far from the latent steps to shape them — and (2) representational drift, where the latent trajectory wanders without a semantic tether. The fix decomposes into two complementary axes: Trajectory Supervision (inject dense stepwise signal) and Space Supervision (preserve the geometry of the latent manifold). The sharp, non-obvious finding is that how you do space supervision matters: rigid geometric compression collapses the high-dimensional reasoning manifold onto sparse static points, while generative reconstruction acts as a flexible semantic anchor that preserves intrinsic dimensionality.

This connects two threads that rarely meet. The trajectory-supervision half is the latent-space analogue of the process-vs-outcome reward debate the vault already holds. Since Why do outcome-based reward models fail at intermediate step evaluation?, outcome supervision is known to underserve intermediate steps in token space — here the same pathology appears in latent space, as gradient attenuation. And Can trajectory structure replace hand-annotated process rewards? echoes the move from sparse outcome to dense process signal without expensive annotation. The second half explains why latent reasoning is harder than verbal: the medium of the latent chain — which Can continuous thoughts have tractable likelihoods for sampling and scoring? tries to make scorable — has no built-in semantic floor, so it needs an explicit anchor that text gets for free from the vocabulary.

The unifying claim is an Information–Performance Binding, measured by a Unified Latent Probe (mutual information between latent trajectory and explicit reasoning steps): reasoning accuracy is strictly bounded by the information fidelity the latent chain retains. The strongest counterargument is that this re-tethers latent reasoning to explicit steps — if the latent chain only works when it preserves high MI with a verbal trace, the headline efficiency of going non-verbal is partly an illusion, and the gain is dense supervision rather than the latent medium itself. Either way, it reframes latent-CoT design from "pick an architecture" to "preserve information along the chain."

Inquiring lines that use this note as a source 5

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
14 direct connections · 129 in 2-hop network ·dense cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

latent chain-of-thought fails by dual collapse — outcome supervision starves gradients along the trajectory and lets the latent space drift, so reasoning accuracy is bounded by the mutual information the latent chain retains