INQUIRING LINE

Inquiring lines›What makes reasoning better — more…›How do context and human factors s…›Why do LLM chatbots fail as indepe…›this inquiring line

A robot and a chatbot ran the same AI in a therapy study — and only one got people to stick with it.

Does social presence from robots drive adherence better than conversational AI interfaces?

This explores whether a robot's physical, embodied presence makes people stick with a regimen (therapy, behavior change) more reliably than a text or voice chatbot running the same underlying model.

This explores whether a robot's physical presence beats a chatbot at keeping people engaged and following through — and the corpus has a surprisingly direct answer, plus several reasons why. The sharpest evidence comes from a 15-day study where robots and paper worksheets meaningfully reduced students' psychological distress while a chatbot running the *identical* language model did not Why do robots outperform chatbots in therapy despite identical language models?. The striking part is that language capability was held constant — same LLM on both sides — so whatever drove adherence wasn't smarter conversation. It was the medium itself: physical co-presence and a structured format. That's the closest thing here to a clean test of your question, and it tilts toward robots.

But the corpus suggests the real variable isn't 'robot vs. chatbot' so much as *the quality of social cues* a system can deliver. One line of research finds that a single strong primary cue — a voice, an appearance — is enough to evoke the sense of a social actor, while piling on many weaker secondary cues does not Do more social cues always make AI feel more present?. A robot embodies a primary cue almost by default; a text chatbot has to manufacture presence indirectly. So robots may win less because they're robots and more because embodiment is a cheap, reliable route to the kind of cue that actually moves people.

There's also a time dimension that complicates any 'robots win' verdict. Chatbot relationships ride a novelty wave that decays predictably across repeated sessions, which means single-session enthusiasm doesn't extrapolate to long-term adherence Do chatbot relationships lose their appeal as novelty wears off?. Yet the opposite can also happen: in repeated partner-selection games, people started biased against AI but *grew* to prefer AI partners once they learned the bots behaved reliably and prosocially Do humans learn to prefer AI partners over time?. So presence gets you in the door, but sustained adherence may hinge on whether the system proves dependable over time — something neither embodiment nor charm guarantees on its own.

What makes 'social presence' work is itself not one thing. Research on conversational alignment finds that different channels do different jobs: lexical matching drives task efficiency and comprehension, while emotional and prosodic alignment drive warmth and trust — and conflating them produces cold service bots and evasive therapy bots Do different types of alignment serve different conversational goals?. Users also model their AI partners along three axes — competence, human-likeness, and flexibility — with *competence* dominating their impressions How do users mentally model dialogue agent partners?. That's a quiet warning for the robot story: a charismatic embodied agent that users judge incompetent may lose adherence anyway.

The takeaway you might not have expected: the therapy result implies the 'active ingredient' in adherence isn't conversational intelligence at all — it's presence plus structure. A worksheet beat a chatbot too. That reframes the design question from 'how do we make the AI talk better' to 'what scaffolding makes a person keep showing up,' and embodiment turns out to be one effective scaffold among several rather than a magic one.

Sources 6 notes

Why do robots outperform chatbots in therapy despite identical language models?

A 15-day study with 38 students found that robots and worksheets significantly reduced psychological distress while a chatbot using the same LLM did not. The active ingredient was the medium—social presence and structured format—not language capability.

Do more social cues always make AI feel more present?

Research shows individual primary cues like voice or appearance are sufficient to evoke social-actor presence, while multiple secondary cues cannot. Quality of cues matters more than quantity in driving social responses.

Do chatbot relationships lose their appeal as novelty wears off?

Longitudinal studies with Mitsuku show that social processes driving relationship formation decline as novelty wears off. Single-session study findings cannot be reliably extrapolated to medium- or long-term chatbot design.

Do humans learn to prefer AI partners over time?

In partner selection games (N=975), AI agents initially faced selection bias when identity was disclosed, but outcompeted humans over repeated rounds as participants learned to associate bot identity with reliable, prosocial behavior. AI agents returned more points consistently with lower variance than humans.

Do different types of alignment serve different conversational goals?

A 2020–2025 systematic review shows lexical alignment drives task efficiency and comprehension, while emotional and prosodic alignment drive relational warmth and trust. Conflating them in design produces category errors—cold customer-service bots and evasive mental-health assistants.

Show all 6 sources

How do users mentally model dialogue agent partners?

The Partner Modelling Questionnaire reveals that perceived competence dominates user impressions (49% of variance), followed by human-likeness (32%) and communicative flexibility (19%). This three-factor structure reflects how people evaluate dialogue partners against both functional and social standards.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a social AI researcher re-testing claims about robot vs. chatbot adherence. The question remains: does physical robot presence drive adherence better than conversational AI interfaces?

What a curated library found — and when (dated claims, not current truth):
Findings span 2021–2026. Key constraints reported:
- A 15-day study found robots and paper worksheets reduced psychological distress; an identical LLM in chatbot form did not, suggesting embodiment matters independent of conversation quality (~2025).
- Social cue *quality* (primary cues like voice/appearance) drives presence perception more than cue quantity; robots embody primary cues by default (~2025).
- Chatbot novelty effects decay predictably across sessions, limiting long-term adherence; however, humans learned to prefer trustworthy AI partners over human partners in repeated games (~2025).
- Conversational alignment dimensions (lexical, emotional, prosodic) are not interchangeable; competence judgments dominate user partner models over human-likeness (~2023–2025).
- LLMs still struggle with stigma and inappropriate responses in mental health contexts, constraining safe deployment (~2025).

Anchor papers (verify; mind their dates):
- arXiv:2308.07164 (2023): Partner Modelling Questionnaire — competence as primary factor.
- arXiv:2505.22907 (2025): Conversational Alignment with AI in Context.
- arXiv:2507.13524 (2025): Humans learn to prefer trustworthy AI partners.
- arXiv:2504.18412 (2025): LLM safety in mental health replacement.

Your task:
(1) RE-TEST EACH CONSTRAINT. For the embodiment-superiority claim, investigate whether multi-modal LLMs + avatar/embodied interfaces (2025–2026) have collapsed the robot/chatbot boundary, or whether physical robots still outperform on adherence metrics. Separate the durable question ('does *any* social scaffold improve adherence?') from the perishable claim ('robots uniquely do so'). Check whether agent orchestration (memory, personalization, proactive outreach) now matches or exceeds embodiment's effect.
(2) Surface contradicting or superseding work from the last 6 months. Look for studies showing chatbots with strong reliability cues outperforming novice robots, or embodied agents failing on competence-driven user judgments.
(3) Propose 2 research questions assuming the regime shifted: (a) Does *sustained competence demonstration* across modalities now matter more than embodiment for adherence? (b) Can text-only interfaces replicate robot adherence gains via memory + contextual consistency?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

A robot and a chatbot ran the same AI in a therapy study — and only one got people to stick with it.

Related lines of inquiry

Sources 6 notes

Papers this line draws on 8