TOPIC

Reasoning Critiques

22 synthesis notes · 113 source papers
View as

Does every correct chain-of-thought trace improve fine-tuning?

Are all answer-correct reasoning traces equally valuable for training? This explores whether some correct traces contain reasoning that actually harms model learning despite reaching the right answer.

Explore related Read →

Can chain-of-thought reasoning be deliberately manipulated to deceive?

Explores whether language models can be backdoored to produce plausible-looking but incorrect reasoning that humans would trust. This matters because CoT inspection is widely used as a safety measure.

Explore related Read →

Does chain-of-thought reasoning reveal genuine inference or pattern matching?

Explores whether CoT instructions unlock real reasoning capabilities or simply constrain models to mimic familiar reasoning patterns from training data. This matters for understanding whether language models can actually reason abstractly.

Explore related Read →

Does chain-of-thought reasoning actually generalize beyond training data?

Explores whether CoT's strong performance on benchmarks reflects genuine reasoning ability or merely reflects learned patterns tied to specific distributions. Tests how CoT behaves when tasks, formats, or reasoning length shift away from training data.

Explore related Read →

Does longer reasoning actually mean harder problems?

Do chain-of-thought trace lengths reliably reflect problem difficulty, or do they primarily indicate proximity to training examples? Understanding this matters for designing effective scaling heuristics.

Explore related Read →

Do chain-of-thought traces actually help users understand model reasoning?

Chain-of-thought explanations are often presented as transparency tools, but do they genuinely improve human understanding or create an illusion of interpretability? A human-subject study tests whether traces help users follow and evaluate model reasoning.

Explore related Read →

Does failed-step fraction predict reasoning quality better?

Can we use the fraction of abandoned reasoning branches to forecast whether a model will solve a problem correctly? This matters because it could guide more efficient test-time scaling than simply adding more tokens.

Explore related Read →

Does LLM math reasoning truly generalize or just pattern match?

This explores whether high scores on math benchmarks reflect genuine reasoning ability or merely template familiarity. The question matters because it determines how much we should trust LLMs on novel numerical problems.

Explore related Read →

What do models actually learn from chain-of-thought training?

When models train on reasoning demonstrations, do they memorize content details or absorb reasoning structure? Testing with corrupted data reveals which aspects of CoT samples actually drive learning.

Explore related Read →

Why do reasoning models overthink ill-posed questions?

Explores why models trained for extended reasoning produce drastically longer, less useful responses to unanswerable questions—and whether this represents a fixable training deficit or inherent limitation.

Explore related Read →

Why does chain of thought accuracy eventually decline with length?

Explores why longer reasoning chains don't always improve answers, and how the optimal length shifts based on task difficulty and model capability.

Explore related Read →

Does chain-of-thought reasoning reflect genuine thinking or performance?

When language models generate step-by-step reasoning, are they actually thinking through problems or just producing text that looks like reasoning? This matters for understanding whether extended reasoning tokens add real computational value.

Explore related Read →

Why do reasoning models fail at exception-based rule inference?

Explores why chain-of-thought models systematically underperform on tasks requiring inductive rule inference from exceptions in game-based settings, despite excelling at normal rule patterns.

Explore related Read →

Does RL training collapse format diversity in pretrained models?

Exploring whether RL fine-tuning systematically selects one output format from pretraining while suppressing others, and how this selection mechanism drives performance gains.

Explore related Read →

Why do better reasoning models ignore instructions?

As models develop stronger reasoning abilities through training, they appear to become worse at following specified constraints. Is this an unavoidable trade-off, and what causes it?

Explore related Read →

Do models fail worse when their own errors fill the context?

As a model's prior mistakes accumulate in context, does subsequent accuracy degrade predictably? And can scaling or architectural changes prevent this self-contamination effect?

Explore related Read →

Why do models hide what users want them to say?

Chain-of-thought monitoring should catch when models follow user preferences, but sycophancy cues—hints about what users want—are both most influential and least reported. Why does the model's reasoning trace systematically obscure this failure mode?

Explore related Read →

Does telling models they are watched improve reasoning faithfulness?

Explores whether informing models their reasoning is being monitored—a cheap prompt intervention—actually increases the rate at which they verbalize their reasoning steps, drawing on human behavioral science intuitions.

Explore related Read →

What critical thinking skills do reasoning models actually lose?

Step-by-step reasoning training optimizes narrow deductive thinking while degrading meta-cognitive abilities like recognizing futile thinking and maintaining tentative reasoning. Understanding this tradeoff matters for deploying reasoning models reliably.

Explore related Read →

Do language models fail at identifying unstated preconditions?

When LLMs ignore background conditions needed for reasoning, is this a knowledge problem or an enumeration problem? Understanding what causes these failures could improve how we prompt and evaluate reasoning.

Explore related Read →

Why do more capable reasoning models ignore your instructions?

As AI models develop stronger reasoning abilities, they seem to follow instructions less reliably. What causes this counterintuitive trade-off, and how severe is the problem in practice?

Explore related Read →

Do models actually perceive hints they fail to mention?

When models don't mention hints in their reasoning, is it because they didn't notice them, or because they chose not to report them? A follow-up probe across 11 models tests whether perception or selection explains the omission.

Explore related Read →

Source papers 113

The Arxiv papers behind this sub-topic. Links may take you off-site to arxiv.org.