Latent Reasoning with Normalizing Flows
Abstract. Large language models often improve reasoning by generating explicit chain-of-thought (CoT), demonstrating the importance of intermediate computation. However, textual CoT forces this computation through a discrete, serial, andcommunication-orientedtokenstream: eachreasoningstepmustbeverbalizedbeforethemodelcanproceed, even when the underlying update is semantic, uncertain, or only partially formed. Latent reasoning offers a higher-bandwidth alternative by performing intermediate computation in compact continuous states before committing to text. Yet existing latent-reasoning methods often sacrifice key advantages that make CoT effective in autoregressive language models, including native left-to-right generation, probabilistic sampling, compatibility with KV-cache decoding, and tractable likelihood estimation. We propose NF-CoT, a latent reasoning framework that preserves these advantages by modeling continuous thoughts with normalizing flows. NF-CoT instantiates a TARFlow-style normalizing flow inside the LLM backbone, defining a tractable probability model over compact continuous thoughts distilled from explicit CoT. Continuous-thought positions are generated by an NF head, while text positions are generated by the standard LM head within the same causal stream.
Introduction. Chain-of-thought (CoT) prompting has become a standard chain-of-thoughtway to elicit reasoning in large language models (LLMs), improving performance by generating intermediate steps before the final answer (Wei et al., 2022; Kojima et al., 2022; Nye et al., 2021). One way to understand its effectiveness is that CoT introduces intermediate reasoning variables between the prompt and the answer, turning a direct input-output prediction into a conditioned prediction through a sampled reasoning path (Zelikman et al., 2022). In explicit CoT, these variables are represented as text tokens, making them naturally autoregressive, probabilistic, and likelihood-scored under the LLM. Yet text is a verbose, low-information-density medium for thought, making long reasoning costly and tying intermediate computation to surface forms (Hao et al., 2024). This inefficiency has motivated latent CoT methods that replace textual reasoning traces with continuous or soft embedding states.
Discussion / Conclusion. We presented NF-CoT, a latent reasoning framework that gives continuous CoT the same modeling status as language tokens by running an autoregressive normalizing flow inside the LLM’s causal stream. NF- CoT models an explicit distribution over reasoning trajectories with exact likelihood, which supports both supervised likelihood training and policy-gradient refinement in the continuous reasoning space. Across code generation benchmarks, NF-CoT improves accuracy over SFT and prior latent-reasoning baselines on Qwen3- 8B-Base, and runs faster than LaDiR in training and inference. These results suggest that likelihood-based latent reasoning offers a practical interface for sampling, scoring, and refining continuous thoughts in LLMs. Limitations Our validation focuses on code-generation benchmarks; extending to other reasoning tasks remains future work.