Can models recognize question difficulty before they reason?

Does reasoning language models encode implicit knowledge of problem difficulty in their hidden states, even before generating solution steps? And if so, why don't they act on this knowledge?

Synthesis note · 2026-05-18 · sourced from Reasoning Methods CoT ToT

S1-Bench's probing analysis demonstrates that difficulty is already there in LRM representations. A single-layer MLP trained on the final-layer hidden state of the last token in an encoded question predicts difficulty with monotonically increasing accuracy across difficulty levels. The structure is implicit but linear — no extra training, no specialized probes, no auxiliary signal is required. The model knows.

The behavioral result then forms a contradiction with this internal knowledge. On simple questions that the linear probe correctly classifies as easy, LRMs still produce redundant solution rounds, repeatedly reverify already-correct answers, and emit higher average token entropy than necessary. The hidden-state signal that says "this is easy" is overridden during generation by exploratory behavior that says "let me check again."

The authors' interpretation — and the most plausible mechanism — is that models exhibit self-doubt about their own early difficulty judgments. The model perceives the question is simple, then second-guesses that perception, then engages in exploratory generation to compensate for the imagined possibility that its initial assessment was wrong. This is a structural failure mode: the architecture lacks a mechanism to commit to an early difficulty assessment and act on it.

The deeper insight is that LRM overthinking is not a perception failure (the model fails to recognize a simple question) but an action failure (the model recognizes the question is simple but cannot translate that recognition into terminating behavior). This distinction matters for fixes: prompt-engineering for "shorter answers on easy questions" treats it as a perception problem and produces brittle results. Mechanistic fixes that route generation through the difficulty representation — for example, conditioning continued-thinking decisions on the probe output — treat it as the action problem it appears to be.

The methodology generalizes. A linear probe on a hidden state is a cheap diagnostic for any property the model is suspected to track implicitly. If the probe succeeds and the behavior contradicts it, the gap localizes the failure to the perception-to-action interface — not to representation, not to capacity.

Inquiring lines that read this note 16

This note is a source for these research framings, grouped by the broader line of inquiry each explores. Scan the bold lines of inquiry; follow any specific question forward.

How should models express uncertainty rather than forced confident answers?

Why do models commit to answers early on easy versus hard tasks?

How does example difficulty affect learning efficiency in language models?

How faithfully do LLMs reflect their actual reasoning in outputs and explanations?

What makes active reasoning through dialogue harder than passive reasoning?

How should inference compute be adaptively allocated based on prompt difficulty?

How should inference budget adapt based on problem difficulty?

When do additional thinking tokens stop improving reasoning performance?

Why do models overthink easy problems and underthink difficult ones?

How does latent reasoning compare to verbalized chain-of-thought?

When is detailed step-by-step reasoning actually counterproductive for solving a problem?

What capability tradeoffs emerge when scaling model reasoning abilities?

Do reasoning traces faithfully represent or merely mimic actual model reasoning?

How does making implicit reasoning requirements explicit change model performance?

How can models identify insufficient information and respond appropriately without guessing?

Can reasoning models reject ill-posed questions or do they overthink?

Is model self-awareness based on genuine introspection or pattern matching?

Do models verbalize their implicit knowledge when that knowledge influences their output?

How do training data properties shape reasoning capability development?

How does question difficulty and breadth affect what models learn to reason?

Related concepts in this collection 4

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

12 direct connections · 128 in 2-hop network ·dense cluster Open in graph ↗

Can models recognize question difficulty before … Does more thinking time always improve reasoning a… Why do reasoning models overthink ill-posed questi… Does chain-of-thought reasoning reflect genuine th… Do reasoning models actually use the hints they re…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Does more thinking time always improve reasoning accuracy? Explores whether extending a model's thinking tokens linearly improves performance, or if there's a point beyond which additional reasoning becomes counterproductive.
the broader phenomenon of overthinking that S1-Bench instantiates; the linear-probe finding is the architectural deepening of that picture
Why do reasoning models overthink ill-posed questions? Explores why models trained for extended reasoning produce drastically longer, less useful responses to unanswerable questions—and whether this represents a fixable training deficit or inherent limitation.
another action-failure case: models perceive ill-posed-ness but cannot translate perception into rejection
Does chain-of-thought reasoning reflect genuine thinking or performance? When language models generate step-by-step reasoning, are they actually thinking through problems or just producing text that looks like reasoning? This matters for understanding whether extended reasoning tokens add real computational value.
complementary finding from a different angle: easy-question behavior is performative even when the model knows the question is easy
Do reasoning models actually use the hints they receive? This explores whether language models acknowledge reasoning hints in their explanations when those hints causally influence their answers. Understanding this gap matters for evaluating whether chain-of-thought explanations can be trusted for safety monitoring.
parallel perception-action gap: models perceive hints but do not verbalize their influence

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

problem difficulty is linearly decodable from LRM hidden states before formal reasoning begins — yet models override this signal with exploratory overthinking suggesting architectural self-doubt

Can models recognize question difficulty before they reason?

Inquiring lines that read this note 16

Related concepts in this collection 4

Related papers in this collection 8

Search by related questions 4