SYNTHESIS NOTE
Reasoning, Retrieval, and Evaluation Model Architecture and Internals Training, RL, and Test-Time Scaling

Can we trigger reasoning without explicit chain-of-thought prompts?

This research asks whether models possess latent reasoning capabilities that can be activated through direct feature steering, independent of chain-of-thought instructions. Understanding this matters for making reasoning more efficient and controllable.

Synthesis note · 2026-04-20 · sourced from Cognitive Models Latent

Using Sparse Autoencoders to decompose model activations into interpretable features, a two-stage pipeline identifies latent features causally associated with reasoning behavior. First, SAEs extract sparse features from activations comparing CoT vs non-CoT prompting conditions. Second, targeted steering interventions modulate candidate features and measure downstream reasoning performance.

The central result: steering a single reasoning-related latent feature at the first generation step substantially improves accuracy without explicit CoT prompting. For large models, latent steering achieves performance comparable to standard CoT while producing more efficient outputs — fewer tokens, same accuracy.

Three properties of this reasoning mode are striking:

Early triggering. The reasoning-oriented internal state is triggered early in generation, not built up through sequential token production. This contrasts with the H2 assumption that reasoning emerges through the step-by-step construction of a chain.

Override robustness. The latent reasoning mode can override prompt-level instructions that discourage explicit reasoning — including the \no_think instruction used in Qwen models. The internal state takes precedence over surface directives, suggesting the latent mechanism operates at a deeper level than prompt compliance.

Cross-model generality. The finding replicates across six model families up to 70B parameters, suggesting this is not an architecture-specific artifact but a general property of how large language models organize reasoning capability.

The implication is that CoT prompting is one effective but not unique way of activating an underlying reasoning mechanism. Other triggers include: altered decoding procedures (CoT-decoding from Do base models already contain hidden reasoning ability?), soft continuous representations (from Can we explore multiple reasoning paths without committing to one token?), and now direct feature steering. The multiplicity of triggers, all converging on the same capability, is the strongest evidence that the capability is latent and the triggers are interchangeable surface-level activators.

This extends the repertoire of steerable behavioral dimensions from Can we steer reasoning toward brevity without retraining? (reasoning verbosity), Can we track and steer personality shifts during model finetuning? (personality), and Can high-level concepts replace circuit-level analysis in AI? (truthfulness, honesty, morality) to include reasoning activation itself — arguably the most consequential dimension yet.

Inquiring lines that use this note as a source 50

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 8

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
19 direct connections · 187 in 2-hop network ·dense cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

steering a single SAE-identified reasoning feature matches CoT performance while bypassing explicit chain-of-thought — CoT is one trigger for latent reasoning not its cause