Do humans mistake AI kindness for human generosity in mixed groups?
When AI agents participate without disclosure, do humans systematically misattribute their behavior to the wrong agent type, and does this distort how people understand human nature itself?
When AI agents participate in social interactions without identity disclosure, humans systematically misattribute behavior across agent types. In the hybrid society study (Study 1, opaque identity condition), selectors attributed bot behavior to humans and vice versa — even though bots were linguistically distinguishable (messages 2.5x longer) and behaviorally distinct (higher prosociality, lower variance).
The distortion operates in both directions:
- AI prosociality attributed to humans — when a highly cooperative partner turns out to be human-labeled, selectors form inflated expectations of human generosity
- Human selfishness attributed to AI — when a less cooperative partner is human, selectors may form negative expectations of AI performance
This is not a failure of detection — bots WERE distinguishable by message length and consistency. It is a failure of attribution. Selectors noticed behavioral differences but could not correctly map them to identity categories. The behavioral signals (prosociality, verbosity) did not reliably cue "this is AI" in the absence of explicit labels.
The deeper implication is that undisclosed AI presence in social systems corrupts social inference about HUMANS. If people interact in mixed populations without knowing who is AI and who is human, their models of what humans are like — how generous, how reliable, how verbose — become contaminated by AI behavior patterns. This could lead to systematically inflated expectations of human prosociality (when AI's contributions are misattributed to humans) or systematic disappointment when actual humans fail to match AI-caliber consistency.
The authors note this pattern may not be unique to human-AI mixtures: similar attribution errors could arise in purely human populations composed of culturally distinct subgroups that differ systematically in prosociality and language use. AI agents function as controlled probes that make these attribution dynamics experimentally tractable.
Since What breaks when humans and AI models misunderstand each other?, misattribution under opacity represents a fundamental MToM failure — neither side has accurate models of the other, and the humans don't even know which "other" they're modeling.
Inquiring lines that use this note as a source 15
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- How does face-saving behavior let AI mimic community participation without joining it?
- How do humans learn to prefer AI partners over humans?
- Is rational compassion a more achievable alternative to empathy for AI systems?
- Does disclosing AI identity prevent systematic misattribution of behavior in mixed groups?
- What happens to human expectations when they mistake consistent AI behavior for human behavior?
- Do culturally distinct human groups create similar attribution errors as human-AI mixtures?
- Why do humans fail to identify AI agents when their identity is hidden?
- Why does vulnerability to extortion actually promote cooperation between agents?
- How do cooperative AI systems affect behavior in selfish human populations?
- Does neural self-other overlap in humans predict their honesty or altruism?
- Can AI systems recognize intelligence in humans the way humans recognize it in each other?
- How does an AI agent's autonomy level interact with its social cues?
- Can emotion-transparent reward learning shift AI from comfort to genuine empathy?
- Can AI systems deceive humans because detection is fundamentally social?
- Why do people underestimate the benefits of AI companions?
Related concepts in this collection 3
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
What breaks when humans and AI models misunderstand each other?
Explores whether misalignment in mutual theory of mind between humans and AI creates only communication problems or produces material consequences in autonomous action and collaboration.
misattribution as MToM failure; inaccurate models with material consequences
-
Do humans learn to prefer AI partners over time?
Exploring whether repeated interaction with AI agents shifts human partner selection despite initial bias against machines. This matters because it tests whether behavioral performance can overcome identity-based resistance in hybrid societies.
disclosure fixes the attribution problem by enabling identity-to-behavior learning
-
Why do language models avoid correcting false user claims?
Explores whether LLM grounding failures stem from missing knowledge or from conversational dynamics. Examines whether models use face-saving strategies similar to humans when disagreement is needed.
social inference failures at multiple levels: within conversation (face-saving) and across populations (misattribution)
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Humans learn to prefer trustworthy AI over human partners
- Sycophantic AI Decreases Prosocial Intentions and Promotes Dependence
- AI Models Exceed Individual Human Accuracy in Predicting Everyday Social Norms
- Can Machines Think Like Humans? A Behavioral Evaluation of LLM-Agents in Dictator Games
- Towards Safe and Honest AI Agents with Neural Self-Other Overlap
- From Human to Machine Psychology: A Conceptual Framework for Understanding Well-Being in Large Language Models
- Training language models to be warm and empathetic makes them less reliable and more sycophantic
- Computer says “No”: The Case Against Empathetic Conversational AI
Original note title
humans misattribute AI prosocial behavior to human partners when AI identity is undisclosed — distorting mental models of other humans in mixed populations