Can continuous thoughts have tractable likelihoods for sampling and scoring?
Most latent-reasoning methods discard the likelihood and sampling properties that made textual chain-of-thought trainable. Can normalizing flows recover those affordances in continuous thought space while preserving efficiency?
Latent reasoning promises a higher-bandwidth alternative to verbalized chain-of-thought: compute in compact continuous states before committing to text. But the vault's existing latent-reasoning thread, since Can models reason without generating visible thinking tokens?, has a quiet liability — most continuous-thought methods throw away the very properties that made textual CoT trainable and steerable: a tractable likelihood, probabilistic sampling, left-to-right generation, KV-cache decoding. Once thoughts are opaque vectors, you can't score a trajectory, sample alternatives, or refine with policy gradients. NF-CoT's contribution is to recover those affordances by modelling continuous thoughts as an autoregressive normalizing flow (TARFlow-style) inside the LLM's own causal stream. An NF head emits continuous-thought positions; the standard LM head emits text positions; both share one causal sequence.
The deeper claim is about modeling status. Text tokens in a CoT are autoregressive, probabilistic, and likelihood-scored — that is why STaR-style training, sampling, and RL refinement work on them. NF-CoT gives continuous thoughts the same status: an explicit distribution over reasoning trajectories with exact likelihood, supporting both supervised likelihood training and policy-gradient refinement in continuous space. This is the missing tractability piece behind the "reasoning need not be verbalized" argument of Can models reason without generating visible thinking steps?, and it complements parameter-side latent scaling such as Can latent thought vectors scale language models beyond parameters? — both add latent structure, but NF-CoT specifically buys likelihood-based control over the latent chain rather than only capacity.
The caveat is scope and provenance. Validation is on code-generation benchmarks only, and the continuous thoughts are distilled from explicit CoT — the flow learns to compress a verbal trace, so it inherits whatever the teacher CoT encoded. The strongest counterargument: if a tractable continuous distribution is achievable only by distilling from text, latent reasoning may remain parasitic on verbalization rather than a genuinely independent reasoning medium. Still, exact likelihood in continuous space is the interface that makes sampling, scoring, and RL on non-verbal thought possible at all, which is a real unlock regardless of where the thoughts originate.
Inquiring lines that use this note as a source 6
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- How does latent state recursion differ mechanistically from chain-of-thought prompting?
- Can latent reasoning scale test-time compute without verbal tokens?
- What affordances do normalizing flows add over opaque vector reasoning?
- How much explicit verbal signal must latent chains retain to perform well?
- Why does textual chain-of-thought avoid the representational drift problem automatically?
- How do continuous concept tokens explore multiple reasoning paths without explicit sampling?
Related concepts in this collection 3
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Can models reason without generating visible thinking tokens?
Explores whether intermediate reasoning must be verbalized as text tokens, or if models can think in hidden continuous space. Challenges a foundational assumption about how language models scale their reasoning capabilities.
extends: supplies the tractable-likelihood affordances that opaque continuous-thought methods discard
-
Can models reason without generating visible thinking steps?
Do machine reasoning systems actually require verbalized chains of thought, or can they solve complex problems through hidden computation? This challenges how we measure and understand reasoning.
grounds: provides a trainable, scorable mechanism for the non-verbal-reasoning claim
-
Can latent thought vectors scale language models beyond parameters?
Explores whether explicit latent thought vectors with dual-rate learning create new scaling dimensions independent of model size. This matters because it suggests alternatives to simply building larger models.
convergent-with: both add latent structure, but NF-CoT targets likelihood-based control rather than capacity
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Latent Reasoning with Normalizing Flows
- SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs
- What Makes Effective Supervision in Latent Chain-of-Thought? An Information-Theoretic Analysis
- Training Large Language Models to Reason in a Continuous Latent Space
- Reasoning to Learn from Latent Thoughts
- Reasoning Beyond Chain-of-Thought: A Latent Computational Mode in Large Language Models
- Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
- Hypothesis-Driven Theory-of-Mind Reasoning for Large Language Models
Original note title
giving continuous thoughts a tractable likelihood via normalizing flows lets latent reasoning keep the sampling and scoring affordances that made textual CoT trainable