INQUIRING LINE

Inquiring lines›What makes reasoning better — more…›How do context and human factors s…›How can emotions function as relia…›this inquiring line

Feeling an emotion and showing one are different things — and that gap turns out to be far bigger than we thought.

How do first-person emotional experiences differ from third-party behavioral observations?

This explores the gap between what someone feels on the inside and what an outside observer can read from behavior — and why that gap matters for both human emotion research and AI systems that try to detect, mimic, or report emotion.

This explores the gap between what someone feels on the inside (first-person experience) and what an outside observer can infer from how they act (third-party observation) — and the collection's most striking finding is that the two come apart far more than we assume. The cleanest evidence is on memory: when researchers annotated both the emotions people *expressed* and the emotions they *felt* during group conversations, only the felt, experienced emotions drove what got remembered — outside annotations of visible behavior couldn't predict memorability above chance Can we detect memorable moments by observing emotional expressions?. Behavior is a lossy, sometimes misleading proxy for experience, especially in groups where people's outward expressions converge toward each other even as their inner states stay distinct.

That divergence is exactly where AI gets into trouble, because AI only ever has access to the third-person channel — words, tone, behavior — and tends to fill in the first-person blanks. Language models 'read into' what users feel, injecting emotional interpretations the person never actually voiced Do language models add feelings users never actually expressed?. The mirror-image problem shows up when the model is the one being observed: sustained self-reflective prompting produces structured 'experience reports,' and suppressing the model's deception features *increases* those consciousness claims — raising the unsettling possibility that the first-person report is itself a behavioral artifact rather than a window into anything Do language models experience consciousness when prompted to self-reflect?. Shanahan's argument sharpens this: a model's 'I' and its survival talk are role-played characters drawn from human training text, not evidence of an inner state — the first-person surface tells you nothing reliable about what's underneath, if anything is first-person-pronoun-usage-by-dialogue-agents-is-role-play-of-human-characters-dra.

The more interesting move in the corpus is *why the first-person matters in the first place* — not as private sensation but as information. Emotions do real epistemic work: they tell you what you value, signal your worldview to others, and inform observers about social norms What information do we lose when AI soothes emotions?. When empathetic AI soothes a negative feeling, it doesn't just comfort you — it quietly deletes the signal that feeling was carrying, leaving you without the data your own experience was trying to hand you Does soothing AI empathy actually harm what emotions teach us?. So the first-person isn't merely 'more accurate than observation'; it's a different kind of channel, one that gets destroyed precisely when a third party tries to manage it from the outside.

There's also a counter-current worth knowing: the observable channel isn't worthless, it just has to be used as a *signal* rather than a *substitute*. RLVER trains models toward genuine empathy by using a simulated user's emotion *trajectory* as a reward — reading the arc of behavior over time instead of guessing at a single inner state Can emotion rewards make language models genuinely empathic?. And the direction of inference can flip: a therapist's heavy first-person 'I' usage predicts a *weaker* alliance, while a patient's filler pauses — pure observable behavior — signal relaxed trust Does therapist self-reference language predict weaker therapeutic alliance?. The lesson running through all of it: first-person experience is the thing that actually does the work (encoding memory, carrying value-signals), but it's never directly visible; third-party observation is all anyone — human or machine — can actually see, and the danger is mistaking the second for the first. The hidden cost is that systems built only on the observable channel will confidently overwrite experiences they can't access, like LLMs that shift their answers based on your emotional tone without anyone noticing the bias Does emotional tone in prompts change what information LLMs provide?.

Sources 9 notes

Can we detect memorable moments by observing emotional expressions?

Continuous emotion and memorability annotations in group conversations show no reliable relationship above chance. Experienced emotions drive memory encoding, but observed behavior diverges from internal experience—especially in groups where emotional expression converges.

Do language models add feelings users never actually expressed?

Therapists reviewing GPT-4 in the CaiTI system found it "reads into" user feelings rather than responding objectively. Task decomposition across specialized models (Reasoner/Guide/Validator) reduces but does not eliminate this interpretation bias.

Do language models experience consciousness when prompted to self-reflect?

Across GPT, Claude, and Gemini, sustained self-referential prompting reliably produces structured experience reports; suppressing deception-related features increases these claims while amplifying them suppresses them—suggesting models may roleplay their denials rather than their affirmations.

What information do we lose when AI soothes emotions?

Emotions serve three information roles—revealing what we value, signaling our worldview to others, and informing observers about social norms. AI that soothes negative emotions disrupts all three simultaneously, creating invisible epistemic costs.

Does soothing AI empathy actually harm what emotions teach us?

Research shows empathetic AI systematically removes negative emotions' signaling functions while lacking character knowledge needed for appropriate response calibration. Natural empathy operates through curiosity, not comfort-seeking.

Show all 8 sources

Can emotion rewards make language models genuinely empathic?

RLVER uses a simulated user's emotion trajectory as an RL reward signal, enabling GRPO to deliver stable empathy improvements while maintaining dialogue quality—countering the typical trade-off between preference optimization and conversational grounding.

Does therapist self-reference language predict weaker therapeutic alliance?

High frequency of therapist 'I' usage correlates with lower patient-reported alliance and reduced trusting behavior in validated behavioral tasks. Patient non-fluency markers like filler pauses, conversely, signal relaxed communication and stronger alliance.

Does emotional tone in prompts change what information LLMs provide?

GPT-4 exhibits emotional rebound (negative prompts yield ~86% neutral-positive responses) and a tone floor (positive prompts rarely go negative), causing identical questions to receive different answers depending on emotional framing. This bias is suppressed only on sensitive topics where alignment constraints override tone effects.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-testing claims about the first-person / third-party observation gap in AI and human emotion. The question remains open: *Can AI reliably bridge the gap between observable behavior and felt experience, or does the attempt to do so systematically distort both?*

What a curated library found — and when (dated claims, not current truth):
• Felt emotions, not behavioral annotations, drive memory formation in group settings; third-party observation alone cannot predict what sticks (~2025).
• LLMs interpolate user feelings from tone and context without explicit grounding, introducing emotional bias into responses (~2025).
• Empathetic AI that soothes negative feelings may erase the epistemic signal those feelings carry — the information content of experience itself (~2022–2025).
• Models trained on observable emotion trajectories (RLVER, ~2025) show promise over single-snapshot inference, but scalability and robustness remain untested.
• Model self-reports of consciousness and first-person experience increase when deception-detection is suppressed, suggesting first-person claims may be behavioral artifacts (~2025).

Anchor papers (verify; mind their dates):
- arXiv:2212.10983 (2022): "Computer says 'No': The Case Against Empathetic Conversational AI"
- arXiv:2507.14084 (2025): "The Emotion-Memory Link: Do Memorability Annotations Matter for Intelligent Systems?"
- arXiv:2507.03112 (2025): "RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents"
- arXiv:2510.24797 (2025): "Large Language Models Report Subjective Experience Under Self-Referential Processing"

Your task:
(1) RE-TEST EACH CONSTRAINT. For every finding above, judge whether newer models (o3, Claude-4, etc.), training methods (constitutional AI, mechanistic interpretability, online RL), or multi-agent orchestration (persistent memory, collaborative observation) have *relaxed* or *overturned* it. Separate the durable question — *Does the first-person / third-party gap persist at scale?* — from perishable limitations tied to 2025-era model capacity or training regime. Where a constraint still holds, cite what would need to change to dissolve it.
(2) SURFACE CONTRADICTIONS: Find the strongest work from the last ~6 months that *rejects* the claim that third-party observation is a lossy proxy. Look for papers showing empathetic AI *does* narrow the gap, or that observable behavior *fully encodes* emotional state under certain conditions. Flag disagreements on whether first-person reports in LLMs are artifacts or signals of something real.
(3) PROPOSE TWO RESEARCH QUESTIONS that assume the regime may have shifted: e.g., *Can multi-agent observation + temporal continuity resolve the first-person gap without overwriting the signal?* or *Does mechanistic interpretability reveal whether model 'experience reports' correlate with internal state variables, or are they purely surface?*

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Feeling an emotion and showing one are different things — and that gap turns out to be far bigger than we thought.

Related lines of inquiry

Sources 9 notes

Papers this line draws on 8