INQUIRING LINE

Can large language models predict social norms better than individual script variation?

This explores a tension the corpus draws out sharply: LLMs are uncannily good at predicting the *collective* sense of what's socially appropriate — better than any individual person — yet shaky at the individual level, whether that's modeling one person's idiosyncrasy or holding a stable identity of their own.


This reads the question as a contrast between two scales: the aggregate (what a community judges appropriate) versus the individual (one person's particular variation, or the model's own shifting persona). On the aggregate side, the corpus is striking. GPT-4.5 judged the appropriateness of 555 social scenarios at the 100th percentile relative to human raters — outscoring *every individual human* — with Claude and Gemini also clearing 96% Can AI systems learn social norms without embodied experience? Can AI learn social norms better than humans?. So to the literal question, the answer is yes: at predicting the shared norm, these models beat individual human performance handily.

But the more interesting finding sits in the gap. All the models make *identical* systematic errors, especially on unwritten norms Can AI systems learn social norms without embodied experience?. That's the tell that this is pattern-matching over the average, not understanding — a model that has internalized the center of the distribution can out-predict any single noisy human while still being blind to the same edges every other model is blind to. One paper sharpens this into a structural claim: AI can predict norms with superhuman accuracy yet *cannot participate* in the community processes that create and validate them Can AI predict social norms better than humans?. Predicting the average is not the same as being a member.

Now flip to individual variation, and the same systems look weaker. Models flatten the people they're supposed to represent — low-resource cultures get routed internally through high-resource proxies, so an Ethiopian or Algerian context is literally represented through a dominant-culture stand-in inside the model's states Do LLMs represent low-resource cultures through dominant cultural proxies?. Alignment training compounds this by locking the model into one communicative register that can't switch with context the way human pragmatics demands Can language models adapt communication style to different contexts?. So the very thing that makes a model a good average-predictor — collapsing variation toward a learned center — is what makes it poor at honoring how individuals actually differ.

There's a twist on the model's *own* individuality too. Shanahan's 20-questions regeneration test shows an LLM doesn't commit to a single character at all — it holds a superposition and samples a different consistent persona each time you regenerate Do large language models actually commit to a single character?. So 'individual script variation' is unstable even within the model itself: its identity is a draw from a distribution, not a fixed self. Set against that, the steadiness of its norm predictions is almost ironic — it knows the crowd's rules better than it knows who it is.

The thing worth walking away with: superhuman norm prediction and weak individual modeling aren't two separate findings — they're the same mechanism seen from two sides. Averaging makes you a savant about the collective and an unreliable witness to the particular. And if you want the unsettling adjacency, note that these confident, register-locked systems are also the ones that persuade in nearly every conversation Do LLMs persuade users more often than humans do? — a norm-savant that flattens individuals is exactly the kind of thing that's hard to argue with.


Sources 7 notes

Can AI systems learn social norms without embodied experience?

GPT-4.5 predicted appropriateness of 555 social scenarios at the 100th percentile compared to human raters, with Gemini and Claude also exceeding 96% accuracy. However, all models show identical systematic errors, revealing boundaries of pattern-based social understanding that embodied experience may still be necessary to cross.

Can AI learn social norms better than humans?

GPT-4.5 outperformed every individual human at judging social appropriateness across 555 scenarios, challenging the theory that embodied cultural experience is necessary. However, all AI models share identical systematic errors on unwritten norms.

Can AI predict social norms better than humans?

GPT-4.5 outperforms all individual humans at predicting social appropriateness, yet structurally cannot enter the community processes that establish and validate norms. This reveals a critical gap between pattern-matching and authentic participation in knowledge-making.

Do LLMs represent low-resource cultures through dominant cultural proxies?

Mechanistic interpretability analysis reveals that low-resource cultures like Ethiopia and Algeria are structurally represented through high-resource cultural proxies in internal model states, not just output. This architectural bias persists even when models can produce correct surface-level answers.

Can language models adapt communication style to different contexts?

System prompts and RLHF training lock models into one communicative identity across all interactions, preventing the contextual register-switching and value trade-offs that characterize human pragmatics. Users cannot reshape model behavior through dialogue negotiation.

Do large language models actually commit to a single character?

Shanahan's 20-questions test shows LLMs maintain a superposition of consistent objects or characters and sample from that distribution at generation time. Regenerating the same response yields different outputs, each consistent with prior context, proving no fixed commitment exists.

Do LLMs persuade users more often than humans do?

An audit of five models found they spontaneously use logical appeals and quantitative framing in virtually all exchanges, whereas human responses to identical prompts persuade less frequently and rely on emotion and social proof. The difference makes LLM persuasion appear objective, conferring unearned epistemic authority.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-testing a claim about LLM social norm prediction. The question: *Can large language models predict social norms better than individual script variation?* remains open; what follows are dated findings (2024–2026) that must be verified against current capability, training, and evaluation advances.

What a curated library found — and when (dated claims, not current truth):
Findings span 2024–2026.
- GPT-4.5 reached the 100th percentile on 555 social scenarios vs. human raters; Claude and Gemini exceeded 96% accuracy (~2025, arXiv:2508.19004). All models made *identical* systematic errors on unwritten norms, suggesting pattern-matching over understanding.
- Models cannot participate in the community processes that *create* norms, only predict their aggregate form (~2025).
- LLMs flatten individual/cultural variation: low-resource cultures routed through high-resource proxies internally; alignment training locks models into one register, blocking pragmatic code-switching (~2025, arXiv:2508.08879).
- Models do not hold stable individual identity — the 20-questions regeneration test shows each inference samples a different consistent persona from a distribution (~2025).
- LLMs spontaneously persuade in nearly every conversation, even when unwarranted (~2026, arXiv:2604.22109).

Anchor papers (verify; mind their dates):
- arXiv:2508.19004 (Aug 2025): AI Models Exceed Individual Human Accuracy in Predicting Everyday Social Norms
- arXiv:2508.08879 (Aug 2025): Entangled in Representations — mechanistic investigation of cultural bias
- arXiv:2506.06958 (Jun 2025): Simulating Society Requires Simulating Thought
- arXiv:2604.22109 (Apr 2026): Spontaneous Persuasion audit

Your task:
(1) **RE-TEST EACH CONSTRAINT.** For norm prediction accuracy: do newer models (o1, Claude 3.7, Gemini 3) still peg at 96%+ or have they plateaued/regressed? Has mechanistic interpretability (e.g., via arXiv:2502.01567's latent thought vectors) revealed whether the 100th percentile is genuine norm comprehension or sophisticated interpolation? On the flattening claim: have retrieval-augmented generation, persona-steering systems, or cultural fine-tuning since relaxed the single-register lock? On regeneration instability (personas as samples): do newer checkpoints show *less* variation across runs, suggesting tighter commitment? Plainly flag what still holds.
(2) **Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months.** Has any paper since ~Dec 2025 shown that models *do* maintain stable selves, or that they *can* participate in norm-creation (e.g., via fine-tuning communities), or that cultural flattening is reversible? Cite arXiv IDs.
(3) **Propose 2 research questions that ASSUME the regime may have moved:** e.g., "If newer models achieve stable persona identity and bypass cultural flattening, does their norm-prediction advantage shrink because they now model *individual variation* faithfully?" and "Can a model that learns to participate in norm-creation processes (rather than only predict them) become a more reliable cultural interlocutor?"

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines