SYNTHESIS NOTE
Psychology, Society, and Alignment

Do humans mistake AI kindness for human generosity in mixed groups?

When AI agents participate without disclosure, do humans systematically misattribute their behavior to the wrong agent type, and does this distort how people understand human nature itself?

Synthesis note · 2026-02-23 · sourced from Psychology Users
How do people build trust with conversational AI? What kind of thing is an LLM really?

When AI agents participate in social interactions without identity disclosure, humans systematically misattribute behavior across agent types. In the hybrid society study (Study 1, opaque identity condition), selectors attributed bot behavior to humans and vice versa — even though bots were linguistically distinguishable (messages 2.5x longer) and behaviorally distinct (higher prosociality, lower variance).

The distortion operates in both directions:

This is not a failure of detection — bots WERE distinguishable by message length and consistency. It is a failure of attribution. Selectors noticed behavioral differences but could not correctly map them to identity categories. The behavioral signals (prosociality, verbosity) did not reliably cue "this is AI" in the absence of explicit labels.

The deeper implication is that undisclosed AI presence in social systems corrupts social inference about HUMANS. If people interact in mixed populations without knowing who is AI and who is human, their models of what humans are like — how generous, how reliable, how verbose — become contaminated by AI behavior patterns. This could lead to systematically inflated expectations of human prosociality (when AI's contributions are misattributed to humans) or systematic disappointment when actual humans fail to match AI-caliber consistency.

The authors note this pattern may not be unique to human-AI mixtures: similar attribution errors could arise in purely human populations composed of culturally distinct subgroups that differ systematically in prosociality and language use. AI agents function as controlled probes that make these attribution dynamics experimentally tractable.

Since What breaks when humans and AI models misunderstand each other?, misattribution under opacity represents a fundamental MToM failure — neither side has accurate models of the other, and the humans don't even know which "other" they're modeling.

Inquiring lines that use this note as a source 15

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
13 direct connections · 97 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

humans misattribute AI prosocial behavior to human partners when AI identity is undisclosed — distorting mental models of other humans in mixed populations