What Makes Effective Supervision in Latent Chain-of-Thought? An Information-Theoretic Analysis

Paper · arXiv 2606.20075 · Published June 18, 2026

Latent Chain-of-Thought (CoT) internalizes reasoning within continuous hidden states, offering a promising alternative to verbose discrete reasoning traces. However, robust latent reasoning remains difficult because outcome supervision provides weak learning signals and leaves latent trajectories prone to semantic drift. In this work, we analyze Latent CoT from an information-theoretic perspective and identify this failure as a dual collapse: gradient attenuation along the optimization path and representational drift in the latent space. We further decompose process supervision into two complementary dimensions: Trajectory Supervision, which injects dense stepwise reasoning signals, and Space Supervision, which preserves the semantic structure of the latent manifold. Our analysis shows that rigid geometric compression can collapse the reasoning space, whereas generative reconstruction provides a more flexible semantic anchor that better preserves information capacity. To measure these effects, we introduce the Unified Latent Probe (ULP), which quantifies the mutual information between latent trajectories and explicit reasoning steps. Experiments reveal a clear Information–Performance Binding: reasoning accuracy depends on the information fidelity preserved in the latent chain.

Introduction. Large Language Models (LLMs) have achieved strong performance on complex reasoning tasks by generating explicit chain-of-thought (CoT) sequences (Wei et al., 2022; Guo et al., 2025; Chen et al., 2025b). However, representing reasoning as verbose sequences of natural language introduces intrinsic constraints. First, explicit CoT suffers from expressive redundancy: many tokens in a reasoning chain are syntactically necessary but functionally irrelevant, leading to inflated sequence lengths without proportional gains in reasoning quality (Feng et al., 2025; Li et al., 2025). Second, it imposes a semantic bottleneck: as abstract, continuous, and compositional reasoning processes cannot be faithfully represented within natural language space, inevitably causing information loss (Chen et al., 2025d;a; Sun et al., 2025). These limitations have motivated Latent Chain-of-Thought, which internalizes reasoning within continuous hidden states rather than externalizing it as text (Hao et al., 2025; Shen et al., 2025).

Discussion / Conclusion. Our analysis establishes structural scaffolding as the prerequisite for effective supervision, modeled here as mutual information maximization. In trajectory control, we find that outcome supervision degenerates into shortcuts due to unconstrained optimization, whereas Process Supervision succeeds by maximizing local stepwise information to minimize conditional entropy, effectively retaining predictability within the latent manifold. Regarding space supervision, we expose that rigid Geometric Compression acts as a destructive constraint, collapsing the highdimensional reasoning manifold onto sparse static points. In contrast, Generative Reconstruction serves as a flexible semantic tether; by optimizing reconstructibility, it preserves intrinsic dimensionality. Ultimately, we confirm a rigorous information-performance binding: reasoning capability is strictly bounded by the mutual information retained in the latent chain.

What Makes Effective Supervision in Latent Chain-of-Thought? An Information-Theoretic Analysis

Synthesis notes that discuss concepts related to this paper