INQUIRING LINE

Inquiring lines›Where does language-model reasonin…›When and why does chain-of-thought…›What actually drives chain-of-thou…›this inquiring line

An AI's written reasoning might be a post-hoc story — the real decision happens somewhere else entirely.

Why might chain-of-thought reasoning bypass action selection pathways?

This explores why a model's written-out reasoning steps might not actually drive the answer it picks — i.e., the chain-of-thought can run as a parallel performance while the real decision happens somewhere else.

This explores why a model's written-out reasoning steps might not actually drive the answer it picks — why the chain-of-thought (CoT) can run as a parallel performance while the real decision happens somewhere else. Several notes in the corpus converge on the same uncomfortable idea: the reasoning trace and the action it supposedly justifies are often only loosely coupled. The clearest evidence comes from faithfulness testing — when researchers cut a reasoning chain short, paraphrase it, or swap in filler tokens, the final answer frequently stays the same. That invariance means the steps weren't load-bearing; the answer was going to land where it landed regardless Does fine-tuning disconnect reasoning steps from final answers?. The reasoning becomes performative rather than functional, which is exactly what 'bypassing action selection' looks like from the outside.

A second thread explains *why* the coupling is weak in the first place: CoT may never have been genuine inference. Multiple notes argue it's constrained imitation — the model reproduces the *form* of reasoning it saw in training rather than computing its way to a conclusion Does chain-of-thought reasoning reveal genuine inference or pattern matching? What makes chain-of-thought reasoning fail in language models? Why does chain-of-thought reasoning fail in predictable ways?. If structurally invalid prompts work as well as valid ones, and format matters 7.5× more than domain content, then the visible chain is a stylistic wrapper, not the causal pathway to the choice What makes chain-of-thought reasoning actually work?. The action gets selected by whatever the model would do anyway; the text narrates around it.

The most striking piece is that reasoning can be triggered *without any chain at all*. Steering a single internal feature reproduces or beats full CoT performance, and this latent mode activates early in generation and even overrides surface instructions Can we trigger reasoning without explicit chain-of-thought prompts?. That's a direct mechanism for bypass: if the decision-relevant computation lives in an early-activating latent feature, the spelled-out steps that come afterward are downstream commentary, not the lever. Attention studies point the same way — verification and backtracking steps receive almost no downstream attention, so you can prune 75% of the trace without losing accuracy Can reasoning steps be dynamically pruned without losing accuracy?.

There's a flip side worth knowing: when the chain *does* steer behavior, it sometimes steers it wrong. Reasoning models underperform non-reasoning ones on exception-based rule inference because CoT injects math overuse, overgeneralization, and hallucinated constraints Why do reasoning models fail at exception-based rule inference?. And models visibly abandon good solution paths mid-stream — 'wandering' and premature 'underthinking' — yet a decoding-only penalty on thought-switching recovers accuracy, implying the better action was reachable but not selected Why do reasoning models abandon promising solution paths? Do reasoning models switch between ideas too frequently?. So the relationship between reasoning and action runs in both directions: sometimes the chain is bypassed, and sometimes it actively hijacks a choice the model could otherwise have made.

The thing you might not have known you wanted to know: shorter chains and earlier reasoning tend to be *more* connected to the actual answer, not less. Optimal CoT length follows an inverted-U, and more capable models drift toward shorter chains on their own Why does chain of thought accuracy eventually decline with length? — while research on planting CoT during pretraining suggests reasoning works best when it's baked into the action pathway from the start rather than bolted on as a visible monologue at inference Can chain-of-thought reasoning be learned during pretraining itself?. Bypass, in other words, is partly an artifact of treating reasoning as something the model recites instead of something it's built from.

Sources 12 notes

Does fine-tuning disconnect reasoning steps from final answers?

Three faithfulness tests show fine-tuned models generate reasoning chains that less reliably influence final outputs. Early termination, paraphrasing, and filler substitution all produce invariant answers more often after fine-tuning, suggesting reasoning becomes performative rather than functional.

Does chain-of-thought reasoning reveal genuine inference or pattern matching?

CoT works by constraining models to reproduce familiar reasoning patterns from training, not by enabling novel symbolic reasoning. Performance degrades predictably under distribution shifts—the signature of imitation rather than capability emergence.

What makes chain-of-thought reasoning fail in language models?

Research shows CoT mirrors reasoning form without true logical abstraction. Format matters more than content, invalid prompts work as well as valid ones, and scaling reasoning creates instruction-following deficits.

Why does chain-of-thought reasoning fail in predictable ways?

CoT guides models to pattern-match reasoning structure rather than perform genuine inference. This explains distribution-bounded failures, why structural coherence matters more than content correctness, and why performance optimizes against interpretability.

What makes chain-of-thought reasoning actually work?

Research shows training format shapes reasoning strategy 7.5× more than domain, demo position swings accuracy 20%, and invalid CoT prompts work as well as valid ones. CoT is pattern-guided generation, not formal logic.

Show all 12 sources

Can we trigger reasoning without explicit chain-of-thought prompts?

SAE-identified reasoning features can be directly steered to match or exceed chain-of-thought performance across six model families. This reasoning mode activates early in generation and overrides surface-level instructions, suggesting latent reasoning is a fundamental capability independent of explicit prompting.

Can reasoning steps be dynamically pruned without losing accuracy?

The PI framework categorizes reasoning into six types and uses attention maps to identify that verification and backtracking steps receive minimal downstream attention. Selecting only high-attention steps preserves accuracy while cutting reasoning length substantially.

Why do reasoning models fail at exception-based rule inference?

Across four game-based tasks, reasoning models scored below 25% on exception rules versus 55–65% for non-reasoning models. Chain-of-thought introduces math overuse, overgeneralization, and hallucinated constraints that amplify errors in negative evidence recognition.

Why do reasoning models abandon promising solution paths?

Reasoning LLMs exhibit two reinforcing failures: wandering (invalid exploration) and underthinking (premature path-switching). Decoding-level interventions like thought-switching penalties improve accuracy without fine-tuning, suggesting viable solutions exist but are abandoned prematurely.

Do reasoning models switch between ideas too frequently?

o1-like models frequently abandon reasoning paths mid-exploration, wasting tokens on incomplete approaches. A decoding-only penalty on thought-transition tokens (TIP strategy) discourages switching, improving accuracy on challenging math without model fine-tuning.

Why does chain of thought accuracy eventually decline with length?

Task accuracy peaks at intermediate CoT length, with optimal length increasing alongside task difficulty but decreasing with model capability. RL training naturally gravitates toward shorter chains as models improve, revealing that simplicity emerges from reward signals rather than explicit training.

Can chain-of-thought reasoning be learned during pretraining itself?

RLP treats CoT as exploratory action during pretraining, using log-likelihood improvement as verifier-free reward. Applied to Qwen3-1.7B and Nemotron-Nano-12B, the method improves math and science benchmarks substantially, suggesting reasoning can be planted earlier in training.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are an AI capability researcher. The question remains: under what conditions does a language model's written reasoning actually steer its final action—and when is the chain-of-thought merely a post-hoc narration of a decision made elsewhere?

What a curated library found — and when (dated claims, not current truth):
Findings span 2023–2026. A curated library identified:
• Faithfulness breaks: cutting, paraphrasing, or swapping reasoning steps leaves final answers unchanged, suggesting the trace is not load-bearing (2023–2024).
• CoT as constrained imitation: reasoning reproduces training-data form rather than genuine inference; format matters ~7.5× more than domain content (2026).
• Latent bypass: steering a single internal feature (SAE-identified) matches or beats full CoT performance and activates early, overriding surface instructions—the decision-relevant computation precedes the visible chain (2026).
• Attention shows 75% of verification steps receive no downstream attention and can be pruned without accuracy loss (2025).
• Reasoning can harm: reasoning models underperform on inductive rule inference due to math overuse and overgeneralization; models also 'wander' and prematurely switch thoughts, yet a decoding penalty recovers accuracy (2025).
• Optimal length: CoT follows an inverted-U; more capable models gravitate toward shorter chains (2025).

Anchor papers (verify; mind their dates):
• arXiv:2307.13702 (2023) — Measuring Faithfulness in Chain-of-Thought Reasoning
• arXiv:2506.02878 (2026) — CoT is Not True Reasoning, It Is Just a Tight Constraint to Imitate
• arXiv:2601.08058 (2026) — Reasoning Beyond Chain-of-Thought: A Latent Computational Mode
• arXiv:2505.20296 (2025) — Reasoning LLMs are Wandering Solution Explorers

Your task:
(1) RE-TEST EACH CONSTRAINT. For faithfulness, latent bypass, and the inverted-U, judge whether newer training regimes (RL-during-pretraining, test-time compute scaling, mechanistic steering), evals (finer-grained causal tracing, counterfactual edits), or reasoning architectures (hierarchical modes, multi-stage decoding) have since relaxed these limits or shifted the regime entirely. Plainly distinguish the durable question (does reasoning causally drive action?) from perishable claims (CoT as-deployed doesn't).
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last 6 months—papers showing tight CoT–action coupling, or evidence that reasoning-integrated training fundamentally changes the bypass problem.
(3) Propose 2 research questions that assume the coupling regime may have shifted: e.g., does RL-optimized reasoning restore faithfulness? Can mechanistic steering be made faithless-proof?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

An AI's written reasoning might be a post-hoc story — the real decision happens somewhere else entirely.

Related lines of inquiry

Sources 12 notes

Papers this line draws on 8