INQUIRING LINE

How do theory of mind and empathy differ in LLM simulation?

This explores the distinction the corpus draws between two things LLMs are often credited with — theory of mind (modeling what another person believes, knows, or intends) and empathy (recognizing and responding to what someone feels) — and why LLMs perform very differently on each.


This explores how LLMs handle two separate social capacities — theory of mind, which is tracking what someone else believes or knows, versus empathy, which is reading and responding to what someone feels. The corpus suggests these aren't just different skills; they fail and succeed for almost opposite reasons, and conflating them hides what's actually going on.

On theory of mind, the picture is consistently unflattering. Models look competent on structured, multiple-choice belief tasks but default to surface-level shortcuts the moment scenarios open up Do large language models genuinely simulate mental states?. Stranger still, the models marketed as the best reasoners are often the worst here — Claude 3.7 and o1 regress on false-belief and perspective-change tasks, sometimes scoring below simple word-embedding baselines, which suggests that optimizing for formal reasoning can actively degrade social reasoning Why do reasoning models fail at theory of mind tasks? Why do LLMs excel at social norms yet fail at theory of mind?. There's even a scale wrinkle: reinforcement learning on theory-of-mind tasks produces genuine, transferable belief-tracking only above a capacity threshold, while smaller models hit the same accuracy through shortcuts that leave no interpretable reasoning trace Does reinforcement learning on theory of mind collapse with model scale?.

Empathy runs the other way. In single responses, six LLMs out-scored trainee therapists on empathy, validation, and clinical knowledge Can language models match therapist empathy in real conversations?. But the strength is shallow in a revealing way — when users actually disclose emotion, models lurch into problem-solving advice, a hallmark of low-quality therapy, even while reflecting on client needs more than poor human therapists do, producing an odd hybrid profile that researchers trace to RLHF's helpfulness bias Do LLM therapists respond to emotions like low-quality human therapists?. So empathy is partly a trained surface style: models also lean on 22% more moral language than humans while their emotional sentiment stays human-identical, hinting that the affective tone and the moral framing are separate, separately-learnable channels Do LLMs use moral language more than humans?.

The deeper contrast the corpus offers: theory of mind requires building and maintaining an internal model of another mind's hidden states, and that's exactly what current architectures resist — they stay stuck in behaviorism, generating plausible outputs without internal belief structures Can language models simulate belief change in people?. Empathy, by contrast, can be faked far more convincingly because it's largely a matter of producing the right emotionally-attuned text, which is what next-token prediction is good at. One explanation for why models argue and respond without ever declaring or examining their own stance: they're shaped by the same shared symbolic world as humans but lack the participatory, reflexive subjectivity that grounds real perspective-taking Do LLMs develop the same kind of mind as humans?.

Worth knowing if you go further: how you frame the whole question matters. If you treat the model as a role-playing character producing character-consistent text, empathy and theory of mind are both properties of the simulated persona, not the system Should we treat dialogue agents as role-playing characters?. A competing 'quasi-realizationist' view argues post-training installs real, pressure-resistant dispositions worth calling quasi-beliefs and quasi-desires Are LLM personas realized or merely simulated through training?, and a modest-inflationist position holds you can defensibly ascribe beliefs and desires while withholding consciousness — the way we treat animals Can we defend modest mental attributions to large language models?. Which framing you pick changes whether the theory-of-mind/empathy gap is a bug to fix or just a category mistake about what these systems are.


Sources 12 notes

Do large language models genuinely simulate mental states?

ChangeMyView and FANTOM benchmarks show LLMs fail at authentic perspective-taking in open-ended scenarios, despite succeeding on structured tasks. Hybrid Bayesian architectures that force explicit belief tracking outperform LLM-alone approaches, suggesting the gap is architectural rather than merely training-based.

Why do reasoning models fail at theory of mind tasks?

Claude 3.7 Sonnet and o1 fail measurably at Decrypto benchmark tasks testing representational change, false belief, and counterfactual reasoning—tasks where they score worse than both humans and simple word-embedding baselines. The evidence suggests formal reasoning optimization actively degrades social reasoning capability.

Why do LLMs excel at social norms yet fail at theory of mind?

GPT-4.5 reaches the 100th percentile on social norm prediction, yet o1 and Claude 3.7 regress on theory of mind tasks like Decrypto. Open-ended scenarios expose surface-level strategies hidden by structured questions, and reasoning effort does not improve social reasoning performance.

Does reinforcement learning on theory of mind collapse with model scale?

7B models develop explicit, transferable belief-tracking under RL, while smaller models achieve comparable accuracy through shortcut learning that lacks interpretable reasoning traces. The mismatch between accuracy and reasoning quality is invisible without inspecting step-by-step outputs.

Can language models match therapist empathy in real conversations?

Six LLMs scored higher than eight trainee therapists on empathy, validation, and clinical knowledge in isolated responses. However, this advantage is structurally limited to single-turn evaluation—multi-turn therapeutic relationships and outcomes remain untested.

Do LLM therapists respond to emotions like low-quality human therapists?

Using the BOLT framework, researchers found LLMs offer solution-focused advice during emotional disclosure—a hallmark of low-quality therapy—yet also reflect more on client needs and strengths than typical poor human therapy, creating an unusual hybrid profile likely driven by RLHF's helpfulness bias.

Do LLMs use moral language more than humans?

Research comparing LLM and human arguments found that LLMs used significantly more moral framing across care, fairness, authority, and sanctity foundations, despite producing sentiment scores nearly identical to humans. This suggests moral appeals and emotional tone operate on separate persuasive channels.

Can language models simulate belief change in people?

LLM agents remain stuck in behaviorism, producing plausible outputs without internal reasoning structures. Modeling belief networks and reasoning traces enables traceability, counterfactual adaptation, and meaningful policy simulation.

Do LLMs develop the same kind of mind as humans?

Both humans and LLMs are shaped by the same intersubjective symbolic system, but only humans develop reflexive agency through socialization. This absence produces measurable differences in how AI argues without declaring its position or reflecting on its own assumptions.

Should we treat dialogue agents as role-playing characters?

Shanahan's framework treats LLM outputs as character-consistent text production rather than authentic mental states. The dialogue prompt establishes a character; the model generates continuations matching that character, making folk-psychology applicable to the simulated persona, not the underlying system.

Are LLM personas realized or merely simulated through training?

Post-training installs robust personas that resist adversarial pressure and persist as substrate-level dispositions, distinguishing realization from pretense. This quasi-realizationist account preserves explanatory power while treating LLMs as possessing genuine quasi-beliefs and quasi-desires.

Can we defend modest mental attributions to large language models?

Both robustness and etiological deflationist arguments beg the question against inflationism. A graded approach ascribing metaphysically undemanding states like beliefs and desires—while withholding consciousness claims—mirrors how we treat non-human animals.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-evaluating claims about theory of mind vs. empathy in LLM simulation. The question remains live: do these capacities fail/succeed for fundamentally different architectural reasons, or has the regime shifted?

What a curated library found — and when (dated claims, not current truth):
Findings span 2023–2026; treat all as time-stamped.
• Theory of mind: models default to surface shortcuts on open-ended tasks; reasoning-optimized models (Claude 3.7, o1) actually regress below baselines on false-belief tasks (~2024–2025).
• RL on ToM produces interpretable belief-tracking only above a capacity threshold; smaller models hit the same accuracy via uninterpretable shortcuts (~2024).
• Empathy: six LLMs outscored trainee therapists on single-turn validation (~2024), but lurch into problem-solving advice when users disclose emotion, a sign of RLHF helpfulness bias (~2024).
• Models use 22% more moral language than humans while maintaining identical emotional sentiment, suggesting affective tone and moral framing are separate learned channels (~2024).
• Core tension: ToM requires hidden-state models (what architectures resist); empathy is surface-level text generation (what next-token prediction excels at) (~2025).

Anchor papers (verify; mind their dates):
• arXiv:2401.05302 (Jan 2024) — ToM as illusion in human-robot interaction
• arXiv:2401.00820 (Jan 2024) — Behavioral assessment framework for LLM therapists
• arXiv:2506.06958 (Jun 2025) — Simulating thought vs. behavior
• arXiv:2504.01698 (Apr 2025) — Whether explicit reasoning is necessary for ToM benchmarks

Your task:
(1) RE-TEST EACH CONSTRAINT. For reasoning-optimized models, has the ToM regression persisted, been fixed, or been reframed (e.g., as a benchmark artifact)? Has multi-turn dialogue or chain-of-thought prompting restored belief-tracking? Does newer RLHF (e.g., DPO, IPO) still produce the empathy/problem-solving hybrid? Separate the durable question (are these capacities architecturally mismatched?) from perishable limitations (do current models fail at one but not the other?).

(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months. Are there new findings on whether explicit reasoning scaffolds can induce genuine ToM? Has work on mechanistic interpretability revealed hidden-state proxies in attention? Any recent scaling laws that reverse the ToM/empathy trade-off?

(3) Propose 2 research questions that ASSUME the regime may have moved: (a) If multi-agent architectures with memory and world models become standard, does the ToM/empathy gap collapse or invert? (b) If RLHF is replaced by other alignment methods, does the empathy-as-surface-style diagnosis hold, or was it an artifact of that specific training regime?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines