SYNTHESIS NOTE

Topics›Reasoning by Reflection›this note

Can confidence patterns reveal overthinking versus underthinking?

This explores whether real-time confidence signals can diagnose when a reasoning model is trapped in redundant deliberation versus committing prematurely, and whether steering based on these signals can balance both failure modes.

Synthesis note · 2026-04-01 · sourced from Reasoning by Reflection

Overthinking and underthinking are dual failures, and existing methods that suppress one often induce the other. Suppressing reflective keywords or truncating reasoning length reduces overthinking but causes underthinking — the model doesn't explore enough. Forcing longer chains reduces underthinking but generates redundancy. ReBalance resolves this by treating confidence as a continuous diagnostic signal rather than using binary interventions.

The diagnostic: Confidence values correlate with reasoning behavior in interpretable ways:

High confidence variance — frequent indecisive switching between reasoning paths, causing redundant steps and delayed convergence. This IS overthinking: the model knows something is wrong but can't commit.
Consistent overconfidence — premature commitment to an incorrect reasoning path. This IS underthinking: the model commits too early without adequate exploration.

The mechanism: From a small-scale dataset, identify reasoning steps indicating each mode. Aggregate their hidden states into reasoning mode prototypes. Compute a steering vector encoding the transition from overthinking to underthinking. A dynamic control function modulates the vector's strength and direction based on real-time confidence: pruning redundancy during overthinking, promoting exploration during underthinking.

Why it's training-free: The steering vector captures the model's inherent reasoning dynamics — it's extracted from the model's own hidden states, not trained. Because it operates on intrinsic representations, it generalizes across unseen data and tasks (math, QA, coding). This makes it plug-and-play across models from 0.5B to 32B.

Since Can we steer reasoning toward brevity without retraining?, ReBalance extends the activation-steering approach from length compression to reasoning quality management. ASC steers between verbose and concise modes; ReBalance steers between overthinking and underthinking — a qualitative distinction, not just quantitative.

Since Does more thinking time always improve reasoning accuracy?, ReBalance provides the dynamic mechanism the threshold finding calls for: instead of a fixed cutoff, confidence-based steering continuously adjusts the reasoning trajectory.

Inquiring lines that read this note 77

This note is a source for these research framings, grouped by the broader line of inquiry each explores. Scan the bold lines of inquiry; follow any specific question forward.

How should models express uncertainty rather than forced confident answers?

What capability tradeoffs emerge when scaling model reasoning abilities?

How does latent reasoning compare to verbalized chain-of-thought?

Why do reward structures fail to shape long-term agent learning?

Can inner thoughts solve the importance recognition problem for agents?

Can model confidence signals reliably improve reasoning quality and calibration?

Does decoupling planning from execution improve multi-step reasoning accuracy?

What makes bilevel metacognition architectural rather than emergent in current systems?

What factors beyond surface content determine how readers extract meaning differently?

What makes schema identification necessary after assessing thoughts and evidence?

How should conversational agents balance goal-driven initiative with user control?

How does AI assistance affect human cognitive development and reasoning autonomy?

When do additional thinking tokens stop improving reasoning performance?

Do reasoning traces faithfully represent or merely mimic actual model reasoning?

What saliency patterns distinguish successful from failed chain-of-thought reasoning?

How should iterative research systems allocate reasoning per search step?

How does overthinking in early turns degrade later retrieval rounds?

Why does self-revision increase model confidence while degrading accuracy?

How do we evaluate AI systems when user perception misleads actual performance?

How can humans calibrate appropriate trust in AI systems?

Does RLHF training sacrifice accuracy and grounding for user agreement?

Can preference optimization reduce overthinking without sacrificing accuracy?

How does reasoning graph topology affect breakthrough insights and generalization?

What distinguishes redundant cycles from productive reconsidering cycles?

Why do persona-level simulations fail to predict individual preferences accurately?

Does persona-level grouping systematically trigger confidence-misdirection failures in practice?

Why do reasoning models fail at systematic problem-solving and search?

Why do agents confidently report success despite actually failing tasks?

What structural features enable agents to detect when understanding has broken down?

How does test-time aggregation affect reasoning correctness and reliability?

What signals detect when consensus training is silently degrading performance?

Do language models develop causal world models or rely on statistical patterns?

Can models track dynamic mental state changes better than static beliefs?

How do self-generated feedback mechanisms enable effective model learning?

What distinguishes intrinsic metacognition from extrinsic human-designed loops?

How can models identify insufficient information and respond appropriately without guessing?

How does proactive critical thinking detect when information is incomplete?

Does AI fluency substitute for verifiable accuracy in human judgment?

Why do benchmark improvements fail to reflect actual reasoning quality?

Can benchmark improvements hide degradation of deliberative reasoning?

How should dialogue systems represent uncertainty from noisy speech input?

How does structured self-dialogue improve uncertainty assessment over confidence scores?

How do soft continuous representations explore multiple reasoning paths simultaneously?

How does continuous soft thinking explore multiple paths without explicit training?

How can AI systems learn from failures without cascading errors?

What pretraining choices and baseline capability constrain reinforcement learning gains?

Does RL training redirect self-doubt into productive gap analysis?

Do corrupted reasoning traces serve as effective supervision signals?

How does confidence filtering improve selection of reasoning traces?

Related concepts in this collection 4

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

12 direct connections · 100 in 2-hop network ·medium cluster Open in graph ↗

Can confidence patterns reveal overthinking vers… Can we steer reasoning toward brevity without retr… Does more thinking time always improve reasoning a… Do reasoning models switch between ideas too frequ… When should an agent actually stop and deliberate?

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Can we steer reasoning toward brevity without retraining? This explores whether model reasoning style occupies learnable geometric directions in activation space, and whether we can shift toward concise thinking by steering through that space without expensive retraining.
ASC compresses length; ReBalance steers reasoning quality
Does more thinking time always improve reasoning accuracy? Explores whether extending a model's thinking tokens linearly improves performance, or if there's a point beyond which additional reasoning becomes counterproductive.
ReBalance provides the dynamic mechanism: confidence-based steering vs fixed threshold
Do reasoning models switch between ideas too frequently? Research explores whether o1-like models abandon promising reasoning paths prematurely by switching to different approaches without sufficient depth, and whether penalizing such transitions could improve accuracy.
complementary: underthinking penalty addresses premature switching; ReBalance addresses premature commitment via overconfidence detection
When should an agent actually stop and deliberate? How can models detect when deliberation over action choices is genuinely needed versus wasteful? This matters because unbounded action spaces make universal deliberation intractable, yet skipping it entirely risks missing critical errors.
both use uncertainty/confidence as the trigger for compute allocation

Can confidence patterns reveal overthinking versus underthinking?

Inquiring lines that read this note 77

Related concepts in this collection 4

Related papers in this collection 8

Search by related questions 4