INQUIRING LINE

Inquiring lines›How do language models construct a…›How does AI persuasion undermine h…›Which computational strategies bes…›this inquiring line

When AI seems to reason step-by-step inside itself, it turns out it's faking the loop — just pattern-matching.

Do latent sequence vectors outperform per-token latent iterative computation for reasoning?

This explores two rival ways of doing reasoning inside an LLM's latent space — operating on whole-sequence or concept-level vectors versus grinding through step-by-step iterative computation token by token — and which the corpus thinks actually works.

This explores two rival ways of "thinking in latent space": working with compact sequence- or concept-level vectors versus running iterative, per-token computation in the hidden states. The corpus tilts fairly clearly toward the sequence-vector camp — but mostly because the per-token iterative route turns out to be something LLMs only pretend to do.

The sharpest evidence against per-token latent iteration comes from work showing that Do large language models actually perform iterative optimization?. When you ask a model to internally run an optimization or numerical procedure, it doesn't actually loop — it recognizes the problem as template-similar to something seen in training and emits a plausible-looking answer. The failure persists across scale and training approach. That fits a broader pattern in the collection: Does chain-of-thought reasoning reveal genuine inference or pattern matching? argues chain-of-thought reproduces familiar reasoning shapes rather than executing novel inference, and Do large language models reason symbolically or semantically? shows performance collapses when you strip the semantic cues and leave only the rules. The token-level machinery is good at pattern completion, not at faithfully iterating.

The sequence-vector approaches look healthier. Meta's Large Concept Model reasons over sentence embeddings in a language-agnostic space before decoding, and Can reasoning happen at the sentence level instead of tokens? reports that this higher-altitude planning yields more coherent output than flat token-by-token generation. Complementing it, Can latent thought vectors scale language models beyond parameters? shows that learning explicit latent "thought" vectors opens a scaling axis independent of parameter count, improving sample efficiency and few-shot reasoning. Both treat reasoning as something to do over compressed representations, not as a token-serial march.

But the more interesting twist is that the contest may be a false binary. Can reasoning systems scale faster by exploring parallel paths instead? argues the real lever isn't depth-of-iteration at all — sampling many parallel latent trajectories matches the benefits of serial reasoning without paying its latency, suggesting width beats depth in latent space. And Do transformers hide reasoning before producing filler tokens? shows models already compute answers in early layers and then overwrite them with format-compliant filler — meaning the "per-token" surface trace can be decorative relative to where the computation actually lives.

So the takeaway you might not have gone looking for: the question of latent vectors vs. per-token iteration is partly answered by the discovery that LLMs don't genuinely iterate in latent space to begin with. They imitate the form of iteration. That reframes the design choice — the win comes from giving the model a representation worth reasoning over (concepts, latent thoughts, parallel paths), not from coaxing it to loop one token at a time.

Sources 7 notes

Do large language models actually perform iterative optimization?

Research shows LLMs cannot perform iterative procedures in latent space. They recognize optimization problems as template-similar and emit plausible-looking but incorrect values, a failure mode that persists across model scale and training approaches.

Does chain-of-thought reasoning reveal genuine inference or pattern matching?

CoT works by constraining models to reproduce familiar reasoning patterns from training, not by enabling novel symbolic reasoning. Performance degrades predictably under distribution shifts—the signature of imitation rather than capability emergence.

Do large language models reason symbolically or semantically?

When semantic content is decoupled from reasoning tasks, LLM performance collapses even with correct rules in context. Models rely on parametric commonsense and token associations rather than formal logical manipulation, constraining reasoning to training distribution semantics.

Can reasoning happen at the sentence level instead of tokens?

Meta's Large Concept Model operates on sentence embeddings rather than tokens, reasoning in a language-agnostic space before decoding to any target language. This hierarchical approach with paragraph-level planning produces more coherent output than flat token generation.

Can latent thought vectors scale language models beyond parameters?

Latent-Thought Language Models achieve superior sample and parameter efficiency by coupling fast local variational learning with slow global decoder learning. This dual-rate scheme scales few-shot reasoning across both model and latent size, creating independent scaling dimensions beyond traditional parameter scaling.

Show all 7 sources

Can reasoning systems scale faster by exploring parallel paths instead?

GRAM demonstrates that recursive reasoning models should maintain and explore multiple latent trajectories in parallel, not only deepen single paths. Width-scaling avoids the serial latency penalty of depth while sampling the solution distribution more effectively on ambiguous problems.

Do transformers hide reasoning before producing filler tokens?

Logit lens analysis shows models trained with hidden CoT tokens compute correct answers in layers 1-3, then actively suppress these representations in final layers to produce format-compliant filler output. The reasoning is fully recoverable from lower-ranked token predictions.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst tasked with re-testing claims about latent reasoning in LLMs. The question remains open: do latent sequence vectors outperform per-token latent iterative computation for reasoning?

What a curated library found — and when (dated claims, not current truth):
Findings span 2023–08/2025. The library's core claims:
• LLMs do NOT faithfully execute iterative numerical methods in latent space; they template-match instead (~2024–25).
• Chain-of-thought is constrained imitation of reasoning form, not genuine abstract inference (2025).
• Sequence-level reasoning (e.g., over sentence embeddings or explicit latent thought vectors) yields more coherent outputs than token-serial generation; latent thoughts open independent scaling axes beyond parameter count (2024–2025).
• Parallel latent trajectories match serial reasoning benefits without latency cost; width beats depth (2025).
• Models compute answers in early layers then overwrite with format filler — surface token traces are decorative (2024–2025).

Anchor papers (verify; mind their dates):
• arXiv:2305.14825 (2023-05): In-Context Semantic Reasoners rather than Symbolic
• arXiv:2412.06769 (2024-12): Training LLMs to Reason in Continuous Latent Space
• arXiv:2502.01567 (2025-02): Posterior Inference of Latent Thought Vectors
• arXiv:2506.02878 (2025-06): CoT as Tight Constraint to Imitate

Your task:
(1) RE-TEST EACH CONSTRAINT. For every finding above, judge whether newer models, training methods, in-context tooling (memory, caching, multi-agent orchestration), or evaluation have since RELAXED or OVERTURNED it. Separate durable questions (e.g., "can LLMs truly iterate?") from perishable limitations (e.g., "latent thought models don't scale"). Cite what resolved each constraint; flag where it still holds.
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months that challenges the "latent vectors win" narrative or shows per-token latent iteration CAN work under certain conditions.
(3) Propose 2 research questions that ASSUME the regime may have shifted: e.g., what if hybrid depth–width strategies or emergent in-context iteration now close the gap?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

When AI seems to reason step-by-step inside itself, it turns out it's faking the loop — just pattern-matching.

Related lines of inquiry

Sources 7 notes

Papers this line draws on 8