Does next-token prediction actually explain how human thought works?
This reads the question as: does the mechanism that powers language models — predicting the next token — tell us anything real about human cognition, or is the resemblance superficial? The corpus suggests the resemblance is mostly superficial, and the more interesting story is how differently the two systems actually work.
This explores whether next-token prediction is a window into human thought or just a powerful trick that happens to produce thought-like output — and the collection leans hard toward the second reading, while complicating it in useful ways. The most direct challenge comes from work showing that when transformers read, they integrate every word in parallel and additively, weighting all the tokens at once rather than selectively suppressing the irrelevant ones — which is exactly why they miss jokes, puns, and frame-dependent meaning Why do AI systems miss jokes and wordplay so consistently?. Human comprehension involves picking one frame and letting it silence the others; next-token prediction structurally can't do that. So at the level of the basic operation, the answer is: this isn't how we think.
The deeper cut is what happens when these models appear to reason. Several notes converge on the finding that visible reasoning traces are persuasive performances, not faithful records of computation — invalid logical steps work nearly as well as valid ones, and corrupted traces generalize about the same, meaning the semantic correctness we read into them isn't what produces the answer Do reasoning traces show how models actually think?. Chain-of-thought turns out to be shaped far more by format and spatial structure than by logical content (training format influences strategy 7.5× more than the actual domain), making it pattern-guided generation rather than formal inference What makes chain-of-thought reasoning actually work?. And when you push it outside its training distribution, it degrades predictably, producing fluent but logically inconsistent output — imitating the form of reasoning without the underlying logic Does chain-of-thought reasoning actually generalize beyond training data?. If you were hoping next-token prediction explains thinking, this is the corpus telling you it explains the appearance of thinking.
Here's the twist that makes the question worth asking, though. Token prediction isn't a flat, surface process — there's real internal structure that doesn't map onto the visible tokens at all. Models can scale up their reasoning entirely in latent space, iterating on hidden states without ever verbalizing a step, which implies that the spoken-out-loud chain of thought is a training artifact rather than a requirement of the reasoning itself Can models reason without generating visible thinking tokens?. Even more strikingly, transformers can compute the correct answer in their earliest layers and then actively overwrite that representation to emit format-compliant filler — the real work and the visible output are decoupled Do transformers hide reasoning before producing filler tokens?. So the prediction mechanism hides a lot of machinery the next-token framing alone would never reveal.
That hidden machinery is also lumpy in a way that's almost cognitive-looking. Only about 20% of tokens are high-entropy 'forking points' where the real decisions get made, and training on just those matches full training Do high-entropy tokens drive reasoning model improvements?. Specific tokens like 'Wait' and 'Therefore' spike in mutual information with the correct answer, and suppressing them measurably hurts reasoning while suppressing random tokens doesn't Do reflection tokens carry more information about correct answers?. You could read this as the model concentrating something like deliberation at a few pivot points. But the corpus keeps pulling the rug: errors are dominated by local memorization from immediately preceding tokens — up to 67% of reasoning mistakes — which is a profoundly un-human failure signature Where do memorization errors arise in chain-of-thought reasoning?.
The thing you might not have known you wanted to know: researchers are actively trying to *make* next-token prediction more thought-like rather than treating it as already being thought. Reinforcement pre-training reframes predicting the next token as a reasoning task with verifiable rewards drawn from the corpus itself Can next-token prediction become a reasoning task with RL?, and other work plants chain-of-thought into pretraining as an exploratory action rewarded by how much it improves prediction Can chain-of-thought reasoning be learned during pretraining itself?. The very existence of this engineering effort is the answer to your question: if next-token prediction natively explained how thought works, nobody would need to bolt reasoning onto it. It's a substrate that can be shaped toward reasoning — not a theory of mind.
Sources 11 notes
Transformers integrate token information through weighted parallel aggregation rather than selective suppression of irrelevant words. This structural difference explains consistent failures with jokes, wordplay, and frame-dependent meaning—not knowledge gaps, but missing cognitive operations.
LLM reasoning traces perform as persuasive appearances rather than reliable explanations of computation. Invalid logical steps perform nearly as well as valid ones, and corrupted traces generalize comparably, showing that semantic correctness is not what produces the performance gains.
Research shows training format shapes reasoning strategy 7.5× more than domain, demo position swings accuracy 20%, and invalid CoT prompts work as well as valid ones. CoT is pattern-guided generation, not formal logic.
DataAlchemy experiments show CoT fails systematically under distributional shifts in task, length, and format. Models produce fluent but logically inconsistent reasoning — imitating reasoning form without valid underlying logic.
Multiple architectures—depth-recurrent models, Heima, and Coconut—demonstrate that test-time compute scales through hidden state iteration rather than token generation. This suggests verbalization is a training artifact, not a reasoning requirement.
Logit lens analysis shows models trained with hidden CoT tokens compute correct answers in layers 1-3, then actively suppress these representations in final layers to produce format-compliant filler output. The reasoning is fully recoverable from lower-ranked token predictions.
Only ~20% of tokens exhibit high entropy as pivotal reasoning decision points; RLVR primarily adjusts these forking tokens. Training exclusively on them matches or exceeds full-gradient performance, revealing that the minority carries the learning signal.
Specific tokens like "Wait" and "Therefore" show sharp spikes in mutual information with correct answers. Suppressing them harms reasoning while suppressing equal random tokens does not, and representation recycling improves accuracy 20%.
STIM framework identifies local, mid-range, and long-range memorization sources in CoT reasoning. Local memorization—based on preceding tokens—accounts for up to 67% of reasoning errors, especially as complexity increases and distributional shift occurs.
Reinforcement Pre-Training transforms next-token prediction into a reasoning task by providing verifiable rewards from the corpus itself, eliminating reward hacking and enabling inference-time scaling during pretraining. This suggests token-level reasoning patterns during pretraining strengthen downstream RL fine-tuning.
RLP treats CoT as exploratory action during pretraining, using log-likelihood improvement as verifier-free reward. Applied to Qwen3-1.7B and Nemotron-Nano-12B, the method improves math and science benchmarks substantially, suggesting reasoning can be planted earlier in training.