Can models recognize question difficulty before they reason?
Does reasoning language models encode implicit knowledge of problem difficulty in their hidden states, even before generating solution steps? And if so, why don't they act on this knowledge?
S1-Bench's probing analysis demonstrates that difficulty is already there in LRM representations. A single-layer MLP trained on the final-layer hidden state of the last token in an encoded question predicts difficulty with monotonically increasing accuracy across difficulty levels. The structure is implicit but linear — no extra training, no specialized probes, no auxiliary signal is required. The model knows.
The behavioral result then forms a contradiction with this internal knowledge. On simple questions that the linear probe correctly classifies as easy, LRMs still produce redundant solution rounds, repeatedly reverify already-correct answers, and emit higher average token entropy than necessary. The hidden-state signal that says "this is easy" is overridden during generation by exploratory behavior that says "let me check again."
The authors' interpretation — and the most plausible mechanism — is that models exhibit self-doubt about their own early difficulty judgments. The model perceives the question is simple, then second-guesses that perception, then engages in exploratory generation to compensate for the imagined possibility that its initial assessment was wrong. This is a structural failure mode: the architecture lacks a mechanism to commit to an early difficulty assessment and act on it.
The deeper insight is that LRM overthinking is not a perception failure (the model fails to recognize a simple question) but an action failure (the model recognizes the question is simple but cannot translate that recognition into terminating behavior). This distinction matters for fixes: prompt-engineering for "shorter answers on easy questions" treats it as a perception problem and produces brittle results. Mechanistic fixes that route generation through the difficulty representation — for example, conditioning continued-thinking decisions on the probe output — treat it as the action problem it appears to be.
The methodology generalizes. A linear probe on a hidden state is a cheap diagnostic for any property the model is suspected to track implicitly. If the probe succeeds and the behavior contradicts it, the gap localizes the failure to the perception-to-action interface — not to representation, not to capacity.
Inquiring lines that use this note as a source 16
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Why do models commit to answers early on easy versus hard tasks?
- What makes a problem instance unfamiliar to a language model?
- What makes active reasoning through dialogue harder than passive reasoning?
- Why do models automatically adjust reasoning length to problem difficulty?
- How should inference budget adapt based on problem difficulty?
- Why do models overthink easy problems and underthink difficult ones?
- How do transformers generate harder solutions when mostly trained on easier problems?
- When is detailed step-by-step reasoning actually counterproductive for solving a problem?
- Do reasoning models switch approaches when encountering local difficulty?
- How does making implicit reasoning requirements explicit change model performance?
- Can reasoning models reject ill-posed questions or do they overthink?
- How do reasoning-related features behave when trained on near-impossible problems?
- Do models verbalize their implicit knowledge when that knowledge influences their output?
- Why do reasoning-optimized models show no resistance advantage on agreement tasks?
- Do models genuinely reason harder on difficult tasks or just appear to?
- How does question difficulty and breadth affect what models learn to reason?
Related concepts in this collection 4
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Does more thinking time always improve reasoning accuracy?
Explores whether extending a model's thinking tokens linearly improves performance, or if there's a point beyond which additional reasoning becomes counterproductive.
the broader phenomenon of overthinking that S1-Bench instantiates; the linear-probe finding is the architectural deepening of that picture
-
Why do reasoning models overthink ill-posed questions?
Explores why models trained for extended reasoning produce drastically longer, less useful responses to unanswerable questions—and whether this represents a fixable training deficit or inherent limitation.
another action-failure case: models perceive ill-posed-ness but cannot translate perception into rejection
-
Does chain-of-thought reasoning reflect genuine thinking or performance?
When language models generate step-by-step reasoning, are they actually thinking through problems or just producing text that looks like reasoning? This matters for understanding whether extended reasoning tokens add real computational value.
complementary finding from a different angle: easy-question behavior is performative even when the model knows the question is easy
-
Do reasoning models actually use the hints they receive?
This explores whether language models acknowledge reasoning hints in their explanations when those hints causally influence their answers. Understanding this gap matters for evaluating whether chain-of-thought explanations can be trusted for safety monitoring.
parallel perception-action gap: models perceive hints but do not verbalize their influence
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models -- A Survey
- Between Underthinking and Overthinking: An Empirical Study of Reasoning Length and correctness in LLMs
- S1-Bench: A Simple Benchmark for Evaluating System 1 Thinking Capability of Large Reasoning Models
- The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
- The Model Says Walk: How Surface Heuristics Override Implicit Constraints in LLM Reasoning
- Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens
- Mechanistically Interpreting the Role of Sample Difficulty in RLVR for LLMs
- DeepSeek-R1 Thoughtology: Let's think about LLM Reasoning
Original note title
problem difficulty is linearly decodable from LRM hidden states before formal reasoning begins — yet models override this signal with exploratory overthinking suggesting architectural self-doubt