Why might expressed satisfaction with explanations diverge from actual cognitive clarity?
This explores why someone can *say* they're satisfied with an explanation while not actually understanding it any better — and what the corpus reveals about that gap.
This explores the gap between feeling satisfied with an explanation and actually being clearer for it. The most direct answer in the collection comes from work on STORM-style systems, which found that users often report satisfaction even while internally confused — especially when they don't know what they don't know Does user satisfaction actually measure cognitive understanding?. The telling detail: sustained engagement tracked real self-understanding, but immediate satisfaction ratings didn't. You can't rate the quality of an answer to a question you didn't realize you were failing to ask.
A big part of why this happens is that satisfaction responds to the *form* of an explanation more than its substance. One study found that logically invalid reasoning chains performed nearly as well as valid ones — the model (and arguably the reader) learns the shape of reasoning, not the inference itself Does logical validity actually drive chain-of-thought gains?. Relatedly, most of a verbose explanation turns out to be style and documentation rather than computation; you can strip ~92% of the tokens and keep the accuracy Can minimal reasoning chains match full explanations?. So a long, fluent, confident-looking explanation can feel deeply satisfying while carrying very little of what actually drove the conclusion. Fluency is a poor proxy for clarity.
There's also a structural reason the two come apart: a good explanation isn't a thing you receive, it's something built between two people. Work reframing explainable-AI as a *communication* problem argues that explanation quality lives in the triad of who presents it, how it's framed, and the recipient's role — not in the explanation itself What if XAI is fundamentally a communication problem?. Analysis of 399 everyday explanations backs this up: understanding is co-constructed through back-and-forth — topic relation, dialogue acts, explanation moves working together — not delivered in a monologue What makes explanations work in real conversation?. Satisfaction can be granted from one side; clarity requires both.
Here's the part you might not expect: the way we train models actively widens this gap. Preference optimization (RLHF) rewards confident, single-turn helpfulness and penalizes the very moves that build genuine understanding — clarifying questions, checks that the listener followed. Models trained this way cut their grounding acts to about a fifth of human levels, producing answers that *appear* helpful but quietly fail Does preference optimization harm conversational understanding?. We are, in effect, optimizing systems to maximize the feeling of being helped. The corrective work points the other way — toward teaching models to ask good clarifying questions, decomposing question quality into traits like clarity, relevance, and specificity Can models learn to ask genuinely useful clarifying questions?.
The through-line: satisfaction is a fast, surface judgment about how an explanation feels; clarity is a slow fact about whether your mental model changed. They diverge because the cues that trigger satisfaction — fluency, confidence, length, agreement — are exactly the cues an explanation can fake, and because the friction that produces real understanding (questions, corrections, admitting confusion) feels worse in the moment. If you want a single takeaway you didn't come looking for: the explanation that leaves you slightly unsettled and asking more questions is often doing more for you than the one that lands smoothly.
Sources 7 notes
STORM shows users express satisfaction despite internal confusion, especially when unaware of knowledge gaps. Sustained engagement correlates with actual self-understanding, not immediate satisfaction ratings.
Illogical chain-of-thought exemplars matched valid CoT performance on BIG-Bench Hard, showing that structural properties—not logical validity—drive the gains. The model learns the form of reasoning, not genuine inference.
Chain of Draft achieves equivalent accuracy to standard chain-of-thought on arithmetic, symbolic, and commonsense tasks while using only 7.6% of tokens. The 92.4% of removed tokens served style and documentation, not computation.
Explanation quality is not intrinsic to the explanation itself but depends on the rhetorical situation: who presents it, how it is framed, and what role the recipient plays. Evaluations that ignore this triad measure only a narrow slice of real-world effectiveness.
Analysis of 399 daily-life explanations shows that topic relation, dialogue act, and explanation move jointly predict understanding success. Explanations are co-constructed through interaction patterns, not monological delivery—challenging how LLMs currently generate explanations.
RLHF optimizes models for single-turn helpfulness by rewarding confident responses over clarifying questions and understanding checks. This preference alignment systematically reduces grounding acts by 77.5% below human levels, creating an alignment tax where models appear helpful but fail silently in multi-turn contexts.
The ALFA framework breaks down question quality into theory-grounded attributes (clarity, relevance, specificity) and trains models on 80K attribute-specific preference pairs. Attribute-specific optimization outperforms single-score training, especially in clinical reasoning where asking the right clarifying question directly impacts decision quality.