Can AI systems learn social norms without embodied experience?
Large language models exceed individual human accuracy at predicting collective social appropriateness judgments. Does this reveal that embodied experience is unnecessary for cultural competence, or do systematic AI failures point to limits of statistical learning?
How appropriate is it to laugh at a job interview? Cry on a bus? Read in church? These judgments require nuanced social understanding that, by standard accounts, requires embodied social experience to acquire. The finding upends this assumption.
Across 555 everyday scenarios evaluated on a continuous appropriateness scale, GPT-4.5 predicted the collective human judgment more accurately than every single human participant (100th percentile). Study 2 replicated with Gemini 2.5 Pro (98.7%), GPT-5 (97.8%), and Claude Sonnet 4 (96.0%). The AI does not just fall "within the range of typical human variation" — it exceeds the vast majority of individual humans at reflecting the collective consensus.
The theoretical framework matters: each human appropriateness rating is treated as an individual's estimate of a shared collective norm, not a personal preference. On this account, both AI and humans are "engaged in a process of accessing and representing a collective consensus." The AI's advantage is statistical — it has learned from vastly more examples of norm expression than any individual human has experienced.
However, all models show "systematic, correlated errors." The failures are not random but structured — all AI architectures make similar mistakes on similar scenarios. This pattern reveals "potential boundaries of pattern-based social understanding" — there are aspects of social norms that statistical learning over linguistic data cannot capture, regardless of model architecture or scale.
The finding directly challenges "strong versions of theories emphasizing the exclusive necessity of embodied experience for cultural competence." Language serves as a "remarkably rich repository for cultural knowledge transmission" — rich enough that statistical learning alone can produce social cognition models that outperform embodied humans. But the correlated error structure preserves space for weaker versions: embodied experience may still be necessary for the subset of norms where all models systematically fail.
The practical implication is immediate: AI systems already have sufficient cultural competence for many social applications, but their systematic blind spots create correlated failure modes that will be harder to detect precisely because they're consistent across models.
Enrichment (2026-02-22, from Arxiv/Personas Personality): LLMs can also infer Big Five personality traits from social media text at accuracy comparable to supervised ML models trained specifically for the task. GPT-3.5 and GPT-4 achieve average r=.29 (range [.22, .33]) between LLM-inferred and self-reported trait scores from Facebook status updates in a zero-shot scenario. However, predictions show demographic bias: more accurate for women and younger individuals on several traits. This adds a personality-inference dimension alongside social-norm prediction — the same statistical pattern-learning mechanism that enables 100th-percentile social norm prediction also enables personality inference, but both show structured biases (correlated errors in norm prediction; demographic skew in personality inference).
Inquiring lines that use this note as a source 43
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- What genuine cultural forms does AI homogeneity actually displace?
- How does unbacked knowledge circulate without the social consensus that normally grounds it?
- Can knowledge flow without an embodied carrier transmitting it?
- What social patterns from human training data activate in agent context?
- Do language models understand tacit workplace norms and unspoken social rules?
- Can linguistic agency exist without embodiment and real-world participation?
- How does communicative standing depend on participation in normative communities?
- Why do moderately represented cultures show more flattening than data-poor cultures?
- What distinguishes genuine cultural understanding from exploited surface-level elimination strategies?
- Can AI predict social norms well enough without embodied experience?
- Why does embodiment choice change what counts as intelligent behavior?
- Can statistical learning from language alone capture all aspects of cultural competence?
- How do AI errors in norm prediction differ from systematic human errors?
- What makes linguistic agency impossible for systems without embodiment?
- What training on actual interaction would show that text-only training cannot?
- Does predicting social norms from outside count as participation?
- Does embodiment and interaction matter for linguistic competence beyond pattern learning?
- Can large language models predict social norms better than individual script variation?
- Do culturally distinct human groups create similar attribution errors as human-AI mixtures?
- Does embodiment matter for genuine linguistic agency?
- Can language meaning emerge without joint attention and shared embodied interaction?
- Does social grounding in language improve through iterative human integration?
- Can language models develop genuine social grounding through human interaction?
- Can LLMs predict social norms without deep integration into linguistic practices?
- What makes social grounding different from constitutive linguistic agency?
- How do language models predict collective social norms better than individual humans?
- How do cultural norms reshape initial interpretations of social intent?
- Can a text-only chatbot feel socially present without visual embodiment?
- What does embodiment and precariousness mean for linguistic agency?
- Do AI systems need embodiment to understand social norms?
- Why do standard social regularization methods miss the actual value networks provide?
- Do different AI models independently converge on the same social outputs?
- What social norms do AI systems consistently fail to understand?
- How much cultural knowledge exists only in unwritten social rules?
- Can statistical learning from text replace embodied cultural experience?
- What social information is missing from language data?
- What mechanisms cause aggregated group memory to diverge from group emotional displays?
- Do LLMs predict social norms more accurately than individual behavior?
- Can pretrained priors set exploration ceilings for empathetic capability development?
- What emergent abilities appear only in truly unified multimodal systems?
- Does alignment compound cultural bias that started during pretraining?
- How do users misattribute social competence to language models in assistant roles?
- Do rare cultural concepts fail predictably as model scale increases?
Related concepts in this collection 10
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
What makes linguistic agency impossible for language models?
From an enactive perspective, does linguistic agency require embodied participation and real stakes that LLMs fundamentally lack? This matters because it challenges whether LLMs can truly engage in language or only generate text.
directly challenged by this finding; strong embodiment requirement doesn't hold for norm prediction
-
Can LLMs acquire social grounding through linguistic integration?
Explores whether LLMs gradually develop social grounding as they become embedded in human language practices, analogous to child language acquisition. Tests whether grounding is a fixed property or an outcome of participatory use.
the social norms finding complicates the trajectory: LLMs may already have sufficient social grounding for norm prediction even before integration
-
Does semantic grounding in language models come in degrees?
Rather than asking whether LLMs truly understand meaning, this explores whether grounding is actually a multi-dimensional spectrum. The question matters because it reframes the sterile understand/don't-understand debate into measurable, distinct capacities.
norm prediction performance suggests "social grounding is weak" may need qualification: weak for participation, strong for prediction
-
Can large language models develop genuine world models without direct environmental contact?
Do LLMs extract meaningful world structures from human-generated text despite lacking direct sensory access to reality? This matters for understanding what kind of grounding and knowledge these systems actually possess.
social norms may be another domain where indirect exposure through text produces functional competence
-
Can AI agents learn people better from interviews than surveys?
Can rich interview transcripts seed more accurate generative agents than demographic data or survey responses? This matters because it challenges how we build digital simulations of real people.
personality inference from text + social norm prediction + interview-based simulation form a capability triad
-
How can proactive agents avoid feeling intrusive to users?
Explores why proactive conversational agents often feel annoying rather than helpful, and what design dimensions could prevent them from violating user expectations and autonomy.
social norm prediction capability could serve the civility dimension of proactive agent design: if models already predict social appropriateness at the 100th percentile, the challenge is not knowledge of norms but real-time application during initiative-taking
-
Can AI personas reliably replicate human experiment results?
Exploring whether LLM-based persona simulations accurately reproduce experimental findings from published psychology and marketing research, and what factors determine when they succeed or fail.
convergent evidence: 100th percentile social norm prediction and 76% experimental replication both show LLMs approximating human behavioral data from text; the replication study adds the precision that accuracy tracks evidence strength, suggesting statistical learning captures consensus better than individual variation
-
Why do AI agents fail at workplace social interaction?
Explores why current AI agents struggle most with communicating and coordinating with colleagues in realistic workplace settings, despite strong reasoning capabilities in other domains.
creates a prediction-participation gap: 100th percentile norm prediction coexists with social interaction as the hardest agentic failure mode; knowing norms and enacting them in real-time multi-turn workplace contexts are different capabilities
-
Do humans apply human-human scripts to AI interactions?
Does CASA theory correctly explain how people interact with media agents, or have decades of technology use created separate interaction scripts? Understanding which scripts drive behavior matters for AI design.
the extended CASA framework suggests norm prediction success may reflect a deeper compatibility: humans already apply media-specific scripts to AI rather than human scripts, and AI's statistical learning of collective norms aligns with what media-specific scripts expect
-
Do more social cues always make AI feel more present?
Explores whether quantity of social cues matters as much as their quality in triggering social responses to AI. Tests whether multiple weak cues can substitute for one strong one.
social norm competence may function as a primary social cue: if a model demonstrates cultural appropriateness at the 100th percentile, this alone may be sufficient to evoke social-actor presence under the MASA paradigm
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- AI Models Exceed Individual Human Accuracy in Predicting Everyday Social Norms
- Humans learn to prefer trustworthy AI over human partners
- Simulating Society Requires Simulating Thought
- SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents
- MetaMind: Modeling Human Social Thoughts with Metacognitive Multi-Agent Systems
- Transcendence: Generative Models Can Outperform The Experts That Train Them
- Does Socialization Emerge in AI Agent Society? A Case Study of Moltbook
- Is this the real life? Is this just fantasy? The Misleading Success of Simulating Social Interactions With LLMs
Original note title
ai models exceed individual human accuracy at predicting collective social norms — challenging strong embodiment requirements for cultural competence