SYNTHESIS NOTE

Can continuous thoughts have tractable likelihoods for sampling and scoring?

Most latent-reasoning methods discard the likelihood and sampling properties that made textual chain-of-thought trainable. Can normalizing flows recover those affordances in continuous thought space while preserving efficiency?

Synthesis note · 2026-06-27 · sourced from Cognitive Models Latent

Latent reasoning promises a higher-bandwidth alternative to verbalized chain-of-thought: compute in compact continuous states before committing to text. But the vault's existing latent-reasoning thread, since Can models reason without generating visible thinking tokens?, has a quiet liability — most continuous-thought methods throw away the very properties that made textual CoT trainable and steerable: a tractable likelihood, probabilistic sampling, left-to-right generation, KV-cache decoding. Once thoughts are opaque vectors, you can't score a trajectory, sample alternatives, or refine with policy gradients. NF-CoT's contribution is to recover those affordances by modelling continuous thoughts as an autoregressive normalizing flow (TARFlow-style) inside the LLM's own causal stream. An NF head emits continuous-thought positions; the standard LM head emits text positions; both share one causal sequence.

The deeper claim is about modeling status. Text tokens in a CoT are autoregressive, probabilistic, and likelihood-scored — that is why STaR-style training, sampling, and RL refinement work on them. NF-CoT gives continuous thoughts the same status: an explicit distribution over reasoning trajectories with exact likelihood, supporting both supervised likelihood training and policy-gradient refinement in continuous space. This is the missing tractability piece behind the "reasoning need not be verbalized" argument of Can models reason without generating visible thinking steps?, and it complements parameter-side latent scaling such as Can latent thought vectors scale language models beyond parameters? — both add latent structure, but NF-CoT specifically buys likelihood-based control over the latent chain rather than only capacity.

The caveat is scope and provenance. Validation is on code-generation benchmarks only, and the continuous thoughts are distilled from explicit CoT — the flow learns to compress a verbal trace, so it inherits whatever the teacher CoT encoded. The strongest counterargument: if a tractable continuous distribution is achievable only by distilling from text, latent reasoning may remain parasitic on verbalization rather than a genuinely independent reasoning medium. Still, exact likelihood in continuous space is the interface that makes sampling, scoring, and RL on non-verbal thought possible at all, which is a real unlock regardless of where the thoughts originate.

Inquiring lines that use this note as a source 6

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
13 direct connections · 129 in 2-hop network ·dense cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

giving continuous thoughts a tractable likelihood via normalizing flows lets latent reasoning keep the sampling and scoring affordances that made textual CoT trainable