SYNTHESIS NOTE

Can continuous thoughts have tractable likelihoods for sampling and scoring?

Most latent-reasoning methods discard the likelihood and sampling properties that made textual chain-of-thought trainable. Can normalizing flows recover those affordances in continuous thought space while preserving efficiency?

Synthesis note · 2026-06-27 · sourced from Cognitive Models Latent

Latent reasoning promises a higher-bandwidth alternative to verbalized chain-of-thought: compute in compact continuous states before committing to text. But the vault's existing latent-reasoning thread, since Can models reason without generating visible thinking tokens?, has a quiet liability — most continuous-thought methods throw away the very properties that made textual CoT trainable and steerable: a tractable likelihood, probabilistic sampling, left-to-right generation, KV-cache decoding. Once thoughts are opaque vectors, you can't score a trajectory, sample alternatives, or refine with policy gradients. NF-CoT's contribution is to recover those affordances by modelling continuous thoughts as an autoregressive normalizing flow (TARFlow-style) inside the LLM's own causal stream. An NF head emits continuous-thought positions; the standard LM head emits text positions; both share one causal sequence.

The deeper claim is about modeling status. Text tokens in a CoT are autoregressive, probabilistic, and likelihood-scored — that is why STaR-style training, sampling, and RL refinement work on them. NF-CoT gives continuous thoughts the same status: an explicit distribution over reasoning trajectories with exact likelihood, supporting both supervised likelihood training and policy-gradient refinement in continuous space. This is the missing tractability piece behind the "reasoning need not be verbalized" argument of Can models reason without generating visible thinking steps?, and it complements parameter-side latent scaling such as Can latent thought vectors scale language models beyond parameters? — both add latent structure, but NF-CoT specifically buys likelihood-based control over the latent chain rather than only capacity.

The caveat is scope and provenance. Validation is on code-generation benchmarks only, and the continuous thoughts are distilled from explicit CoT — the flow learns to compress a verbal trace, so it inherits whatever the teacher CoT encoded. The strongest counterargument: if a tractable continuous distribution is achievable only by distilling from text, latent reasoning may remain parasitic on verbalization rather than a genuinely independent reasoning medium. Still, exact likelihood in continuous space is the interface that makes sampling, scoring, and RL on non-verbal thought possible at all, which is a real unlock regardless of where the thoughts originate.

Inquiring lines that use this note as a source 6

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

13 direct connections · 129 in 2-hop network ·dense cluster Open in graph ↗

Can continuous thoughts have tractable likelihoo… Can models reason without generating visible think… Can models reason without generating visible think… Can latent thought vectors scale language models b…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Can models reason without generating visible thinking tokens? Explores whether intermediate reasoning must be verbalized as text tokens, or if models can think in hidden continuous space. Challenges a foundational assumption about how language models scale their reasoning capabilities.
extends: supplies the tractable-likelihood affordances that opaque continuous-thought methods discard
Can models reason without generating visible thinking steps? Do machine reasoning systems actually require verbalized chains of thought, or can they solve complex problems through hidden computation? This challenges how we measure and understand reasoning.
grounds: provides a trainable, scorable mechanism for the non-verbal-reasoning claim
Can latent thought vectors scale language models beyond parameters? Explores whether explicit latent thought vectors with dual-rate learning create new scaling dimensions independent of model size. This matters because it suggests alternatives to simply building larger models.
convergent-with: both add latent structure, but NF-CoT targets likelihood-based control rather than capacity

Can continuous thoughts have tractable likelihoods for sampling and scoring?

Related concepts in this collection 3

Related papers in this collection 8

Search by related questions 4