SYNTHESIS NOTE

Can AI systems learn social norms without embodied experience?

Large language models exceed individual human accuracy at predicting collective social appropriateness judgments. Does this reveal that embodied experience is unnecessary for cultural competence, or do systematic AI failures point to limits of statistical learning?

Synthesis note · 2026-02-22 · sourced from Theory of Mind

How appropriate is it to laugh at a job interview? Cry on a bus? Read in church? These judgments require nuanced social understanding that, by standard accounts, requires embodied social experience to acquire. The finding upends this assumption.

Across 555 everyday scenarios evaluated on a continuous appropriateness scale, GPT-4.5 predicted the collective human judgment more accurately than every single human participant (100th percentile). Study 2 replicated with Gemini 2.5 Pro (98.7%), GPT-5 (97.8%), and Claude Sonnet 4 (96.0%). The AI does not just fall "within the range of typical human variation" — it exceeds the vast majority of individual humans at reflecting the collective consensus.

The theoretical framework matters: each human appropriateness rating is treated as an individual's estimate of a shared collective norm, not a personal preference. On this account, both AI and humans are "engaged in a process of accessing and representing a collective consensus." The AI's advantage is statistical — it has learned from vastly more examples of norm expression than any individual human has experienced.

However, all models show "systematic, correlated errors." The failures are not random but structured — all AI architectures make similar mistakes on similar scenarios. This pattern reveals "potential boundaries of pattern-based social understanding" — there are aspects of social norms that statistical learning over linguistic data cannot capture, regardless of model architecture or scale.

The finding directly challenges "strong versions of theories emphasizing the exclusive necessity of embodied experience for cultural competence." Language serves as a "remarkably rich repository for cultural knowledge transmission" — rich enough that statistical learning alone can produce social cognition models that outperform embodied humans. But the correlated error structure preserves space for weaker versions: embodied experience may still be necessary for the subset of norms where all models systematically fail.

The practical implication is immediate: AI systems already have sufficient cultural competence for many social applications, but their systematic blind spots create correlated failure modes that will be harder to detect precisely because they're consistent across models.

Enrichment (2026-02-22, from Arxiv/Personas Personality): LLMs can also infer Big Five personality traits from social media text at accuracy comparable to supervised ML models trained specifically for the task. GPT-3.5 and GPT-4 achieve average r=.29 (range [.22, .33]) between LLM-inferred and self-reported trait scores from Facebook status updates in a zero-shot scenario. However, predictions show demographic bias: more accurate for women and younger individuals on several traits. This adds a personality-inference dimension alongside social-norm prediction — the same statistical pattern-learning mechanism that enables 100th-percentile social norm prediction also enables personality inference, but both show structured biases (correlated errors in norm prediction; demographic skew in personality inference).

Inquiring lines that read this note 43

This note is a source for these research framings, grouped by the broader line of inquiry each explores. Scan the bold lines of inquiry; follow any specific question forward.

Can AI-generated outputs constitute genuine knowledge or valid claims?

How does AI-generated content transformation affect public discourse quality?

How does unbacked knowledge circulate without the social consensus that normally grounds it?

Is embodied interaction necessary for language meaning and genuine agency?

Can AI systems develop genuine social understanding without embodiment?

How do language models establish social grounding in human dialogue?

Why should disagreement be treated as signal in collaborative reasoning?

How does communicative standing depend on participation in normative communities?

Why do persona-level simulations fail to predict individual preferences accurately?

Why do moderately represented cultures show more flattening than data-poor cultures?

Does conversational format create illusions of genuine AI communication?

What training on actual interaction would show that text-only training cannot?

How do language models inherit human biases from training data?

How do chatbots affect human self-disclosure and emotional engagement?

Can a text-only chatbot feel socially present without visual embodiment?

How can identical external performance mask different internal representations?

Why do standard social regularization methods miss the actual value networks provide?

How do formal dialogue structures reveal conversation coherence mechanisms?

What social information is missing from language data?

How can emotions function as reliable information in reasoning and cognitive systems?

What mechanisms cause aggregated group memory to diverge from group emotional displays?

Can AI systems balance emotional competence with factual reliability?

Can pretrained priors set exploration ceilings for empathetic capability development?

What articulatory information do speech signals carry that text cannot?

What emergent abilities appear only in truly unified multimodal systems?

Does RLHF training sacrifice accuracy and grounding for user agreement?

Does alignment compound cultural bias that started during pretraining?

Why do language models reinforce false assumptions instead of correcting them?

How do users misattribute social competence to language models in assistant roles?

How can AI systems learn from failures without cascading errors?

Do rare cultural concepts fail predictably as model scale increases?

Related concepts in this collection 10

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

25 direct connections · 198 in 2-hop network ·medium cluster Open in graph ↗

Can AI systems learn social norms without embodi… What makes linguistic agency impossible for langua… Can LLMs acquire social grounding through linguist… Does semantic grounding in language models come in… Can large language models develop genuine world mo… Can AI agents learn people better from interviews … How can proactive agents avoid feeling intrusive t… Can AI personas reliably replicate human experimen… Why do AI agents fail at workplace social interact…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

What makes linguistic agency impossible for language models? From an enactive perspective, does linguistic agency require embodied participation and real stakes that LLMs fundamentally lack? This matters because it challenges whether LLMs can truly engage in language or only generate text.
directly challenged by this finding; strong embodiment requirement doesn't hold for norm prediction
Can LLMs acquire social grounding through linguistic integration? Explores whether LLMs gradually develop social grounding as they become embedded in human language practices, analogous to child language acquisition. Tests whether grounding is a fixed property or an outcome of participatory use.
the social norms finding complicates the trajectory: LLMs may already have sufficient social grounding for norm prediction even before integration
Does semantic grounding in language models come in degrees? Rather than asking whether LLMs truly understand meaning, this explores whether grounding is actually a multi-dimensional spectrum. The question matters because it reframes the sterile understand/don't-understand debate into measurable, distinct capacities.
norm prediction performance suggests "social grounding is weak" may need qualification: weak for participation, strong for prediction
Can large language models develop genuine world models without direct environmental contact? Do LLMs extract meaningful world structures from human-generated text despite lacking direct sensory access to reality? This matters for understanding what kind of grounding and knowledge these systems actually possess.
social norms may be another domain where indirect exposure through text produces functional competence
Can AI agents learn people better from interviews than surveys? Can rich interview transcripts seed more accurate generative agents than demographic data or survey responses? This matters because it challenges how we build digital simulations of real people.
personality inference from text + social norm prediction + interview-based simulation form a capability triad
How can proactive agents avoid feeling intrusive to users? Explores why proactive conversational agents often feel annoying rather than helpful, and what design dimensions could prevent them from violating user expectations and autonomy.
social norm prediction capability could serve the civility dimension of proactive agent design: if models already predict social appropriateness at the 100th percentile, the challenge is not knowledge of norms but real-time application during initiative-taking
Can AI personas reliably replicate human experiment results? Exploring whether LLM-based persona simulations accurately reproduce experimental findings from published psychology and marketing research, and what factors determine when they succeed or fail.
convergent evidence: 100th percentile social norm prediction and 76% experimental replication both show LLMs approximating human behavioral data from text; the replication study adds the precision that accuracy tracks evidence strength, suggesting statistical learning captures consensus better than individual variation
Why do AI agents fail at workplace social interaction? Explores why current AI agents struggle most with communicating and coordinating with colleagues in realistic workplace settings, despite strong reasoning capabilities in other domains.
creates a prediction-participation gap: 100th percentile norm prediction coexists with social interaction as the hardest agentic failure mode; knowing norms and enacting them in real-time multi-turn workplace contexts are different capabilities
Do humans apply human-human scripts to AI interactions? Does CASA theory correctly explain how people interact with media agents, or have decades of technology use created separate interaction scripts? Understanding which scripts drive behavior matters for AI design.
the extended CASA framework suggests norm prediction success may reflect a deeper compatibility: humans already apply media-specific scripts to AI rather than human scripts, and AI's statistical learning of collective norms aligns with what media-specific scripts expect
Do more social cues always make AI feel more present? Explores whether quantity of social cues matters as much as their quality in triggering social responses to AI. Tests whether multiple weak cues can substitute for one strong one.
social norm competence may function as a primary social cue: if a model demonstrates cultural appropriateness at the 100th percentile, this alone may be sufficient to evoke social-actor presence under the MASA paradigm

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

ai models exceed individual human accuracy at predicting collective social norms — challenging strong embodiment requirements for cultural competence

Can AI systems learn social norms without embodied experience?

Inquiring lines that read this note 43

Related concepts in this collection 10

Related papers in this collection 8

Search by related questions 4