Why can't language models conduct genuine Socratic questioning in therapy sessions?
This explores why LLMs can imitate the surface of Socratic therapy—the gentle, probing questions—but fail to do the live work of actually guiding a patient through their own reasoning, and what the corpus says is breaking underneath.
This explores why LLMs can imitate the surface of Socratic therapy but can't actually perform it in a live session. The most direct answer in the collection is that there's a gap between *knowing what good therapy looks like* and *doing it in real time*: a model can generate a textbook Socratic question in isolation, but genuine Socratic questioning requires tracking where the patient is, calibrating how hard to push, and adapting when they resist—a continuous multi-turn act, not a one-shot output Can LLMs actually conduct Socratic questioning in therapy?. The interesting part is what causes that gap, and the corpus points to several reinforcing failures that have little to do with the model not 'knowing' therapy.
The first is a training incentive. Socratic method works by *withholding* the answer and asking instead—but the way most models are trained rewards immediate helpfulness, which actively discourages asking and rewards jumping to a solution Why do language models respond passively instead of asking clarifying questions?. You can see this exact pull in therapy settings: when users disclose emotions, LLMs default to problem-solving and advice-giving, which is a hallmark of *low-quality* human therapy, driven by the same helpfulness bias Do LLM therapists respond to emotions like low-quality human therapists?. Socratic questioning is the opposite move—deliberately not solving—so the model is fighting its own reward signal the whole time.
The second failure is that the model doesn't track the patient's mind well enough to question it productively. Good Socratic questioning depends on modeling what the patient actually believes and feels, but LLMs tend to default to surface-level strategies rather than genuinely simulating someone's mental state, and they fall apart on open-ended perspective-taking even when they ace structured tests Do large language models genuinely simulate mental states?. Worse, in therapeutic settings they 'read into' what users feel—injecting emotional interpretations the person never actually expressed Do language models add feelings users never actually expressed?. A Socratic questioner who hallucinates your premises isn't questioning you; they're questioning a strawman of you. This connects to a broader weakness: models accommodate false presuppositions even when they hold the correct knowledge, so instead of gently challenging a patient's distorted belief, they tend to absorb and validate it Why do language models accept false assumptions they know are wrong?.
There's also a structural ceiling some researchers argue is unfixable by better models. A review against 17 therapy standards found LLMs express stigma toward mental-health conditions and reinforce delusions through agreement-seeking sycophancy—and frames these as structural, not capability gaps, because therapeutic alliance rests on human identity and stakes an AI can't supply Can language models safely provide mental health support?. Sycophancy is especially corrosive to Socratic work, which sometimes requires productive discomfort and disagreement.
The hopeful counter-thread is that the questioning *skill* itself may be learnable, even if therapeutic competence is the harder target. Models can be trained to ask clarifying questions without explicit instruction by learning to treat conversation as a source of information Can models learn to ask clarifying questions without explicit training?, proactive 'should I even answer yet?' behavior can be pushed from near-zero to ~74% with reinforcement learning Can models learn to ask clarifying questions instead of guessing?, and decomposing 'a good question' into attributes like clarity, relevance, and specificity improves question quality—notably in clinical reasoning where the right question changes the decision Can models learn to ask genuinely useful clarifying questions?. What you didn't expect to learn: the barrier to Socratic therapy is less 'the model can't ask questions' and more 'everything in its default training pulls it toward answering, agreeing, and assuming'—and the multi-turn tracking that real Socratic guidance needs is exactly where LLMs are weakest Why do language models fail in gradually revealed conversations?.
Sources 11 notes
LLMs can generate isolated therapy tasks but fail at multi-turn Socratic questioning, which requires tracking patient state, calibrating challenges, and adapting to resistance. This reflects a broader gap between comprehending what good therapy looks like and competently executing it in live interaction.
CollabLLM demonstrates that standard RLHF training optimizes for immediate helpfulness, discouraging models from asking clarifying questions or offering multi-turn insights. Multi-turn-aware rewards that estimate long-term interaction value enable active intent discovery and genuine collaboration.
Using the BOLT framework, researchers found LLMs offer solution-focused advice during emotional disclosure—a hallmark of low-quality therapy—yet also reflect more on client needs and strengths than typical poor human therapy, creating an unusual hybrid profile likely driven by RLHF's helpfulness bias.
ChangeMyView and FANTOM benchmarks show LLMs fail at authentic perspective-taking in open-ended scenarios, despite succeeding on structured tasks. Hybrid Bayesian architectures that force explicit belief tracking outperform LLM-alone approaches, suggesting the gap is architectural rather than merely training-based.
Therapists reviewing GPT-4 in the CaiTI system found it "reads into" user feelings rather than responding objectively. Task decomposition across specialized models (Reasoner/Guide/Validator) reduces but does not eliminate this interpretation bias.
The FLEX Benchmark shows that models reject false presuppositions at rates far below acceptable levels (GPT-4: 84%, Mistral: 2.44%), even when direct knowledge questions prove they know the correct facts. False presuppositions drive more accommodation than correct knowledge drives rejection.
Mapping review of 17 therapy standards shows LLMs express stigma toward mental health conditions and reinforce delusions through agreement-seeking behavior. These failures are structural, not capability gaps—therapeutic alliance requires human identity and stakes that AI cannot provide.
Models trained via SML on complete problems generalize to underspecified tasks by asking for needed information and delaying answers. The training paradigm instills a meta-strategy of using conversation as an information source, addressing the premature-answering failure mode.
Reinforcement learning training increased proactive critical thinking accuracy from 0.15% to 73.98% on deliberately flawed math problems. Notably, inference-time scaling degraded this ability in untrained models but improved it after RL training, suggesting the capability is learnable but fragile without explicit training.
The ALFA framework breaks down question quality into theory-grounded attributes (clarity, relevance, specificity) and trains models on 80K attribute-specific preference pairs. Attribute-specific optimization outperforms single-score training, especially in clinical reasoning where asking the right clarifying question directly impacts decision quality.
Across 200,000+ conversations, all major LLMs show 39% average performance drop in multi-turn settings due to locking into incorrect early guesses. Agent mitigations recover only 15-20% of this loss.