INQUIRING LINE

Why do language models approximate collective human judgment better than individuals?

This explores why an LLM can match the average of a crowd's judgment more reliably than it matches any one person — and what that gap reveals about how these models actually learn.


This explores why language models seem to capture collective human judgment so well while stumbling on individuals. The clearest evidence sits in work on social norms: GPT-4.5 out-predicted *every* individual human at judging whether behavior was socially appropriate across hundreds of scenarios Can AI learn social norms better than humans?. That sounds like superhuman insight, but the mechanism is humbler. A model trained on the aggregate textual output of millions of people converges on the *consensus* view. Any single human is noisier than the average — we each carry idiosyncratic blind spots — so beating an individual is partly just beating noise with a smoothed mean.

The same pattern shows up when models are tuned on actual behavioral data: LLMs finetuned on psychology experiments predict human decisions better than the theory-driven cognitive models built to explain them, and they even encode individual differences in their embeddings Can language models learn to model human decision making?. So the collective-vs-individual story isn't that models *can't* represent persons — it's that they're optimized to reproduce population-level regularities, and the individual signal is weaker and harder to anchor.

Where it breaks is exactly where the individual matters. Models fail to track how a *particular* person's reasoning style evolves over time, leaning on surface lexical cues instead of adapting to a developing strategy Can models recognize how individuals reason differently?. And in open-ended perspective-taking, LLMs default to surface strategies rather than genuinely simulating one mind's beliefs — the gap looks architectural, not just a matter of more training Do large language models genuinely simulate mental states?. The crowd is legible because it's an average; the individual requires modeling a moving, specific target.

There's a sharper twist worth knowing. "Approximating collective judgment" hides a bias toward the *dominant* collective. Mechanistic analysis shows low-resource cultures get internally represented through high-resource proxies — the model flattens minority groups into the majority's coordinates Do LLMs represent low-resource cultures through dominant cultural proxies?. So the "collective" a model approximates isn't all of humanity; it's the loudest, most-represented slice. And even on social norms, all the models share *identical* systematic errors on unwritten rules Can AI predict social norms better than humans? — a tell that they're pattern-matching a corpus, not participating in the living process that makes norms.

The deepest framing here: a model can out-score humans as an *observed* predictor of group behavior while remaining categorically unable to *participate* in how that behavior gets made. From the outside, it mirrors the collective; from the inside, it never enters the discourse that produces individual stances and evolving norms Do humans and LLMs differ fundamentally or just superficially?. That's the real answer to "why" — collective judgment is something you can statistically reconstruct from text, while individual judgment is something you have to track, and authentic norms are something you have to help create.


Sources 7 notes

Can AI learn social norms better than humans?

GPT-4.5 outperformed every individual human at judging social appropriateness across 555 scenarios, challenging the theory that embodied cultural experience is necessary. However, all AI models share identical systematic errors on unwritten norms.

Can AI predict social norms better than humans?

GPT-4.5 outperforms all individual humans at predicting social appropriateness, yet structurally cannot enter the community processes that establish and validate norms. This reveals a critical gap between pattern-matching and authentic participation in knowledge-making.

Can language models learn to model human decision making?

LLMs finetuned on psychology experiment data predict human behavior more accurately than theory-driven models in decision tasks, capture individual differences in their embeddings, and transfer learning across tasks without task-specific design.

Can models recognize how individuals reason differently?

LLMs struggle to anchor reasoning in temporal gameplay and adapt to evolving strategies. GPT-4o relies on surface lexical cues while DeepSeek-R1 shows early promise, but dynamic style adaptation remains largely insufficient across all models tested.

Do large language models genuinely simulate mental states?

ChangeMyView and FANTOM benchmarks show LLMs fail at authentic perspective-taking in open-ended scenarios, despite succeeding on structured tasks. Hybrid Bayesian architectures that force explicit belief tracking outperform LLM-alone approaches, suggesting the gap is architectural rather than merely training-based.

Do LLMs represent low-resource cultures through dominant cultural proxies?

Mechanistic interpretability analysis reveals that low-resource cultures like Ethiopia and Algeria are structurally represented through high-resource cultural proxies in internal model states, not just output. This architectural bias persists even when models can produce correct surface-level answers.

Do humans and LLMs differ fundamentally or just superficially?

Applied Habermas's observer/participant distinction to AI: from outside, humans and LLMs are utterly different; from within shared discourse, both draw on the same symbolic substrate, making the difference structural rather than absolute.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst evaluating whether LLMs' documented superiority at predicting collective human judgment over individuals still holds, or whether recent capability shifts have reframed the question.

What a curated library found — and when (dated claims, not current truth):
Findings span 2023–2025. A curated library documented:
• GPT models out-predict every individual human at social-norm judgment across hundreds of scenarios; mechanism is consensus-averaging, not superhuman insight (2025-08, arXiv:2508.19004)
• LLMs fail to track how *particular* persons' reasoning styles evolve over time, relying on surface lexical cues instead of adaptive modeling (2024-04)
• Models default to surface-level theory-of-mind strategies rather than genuine belief simulation — framed as architectural, not training-addressable (2025-02, arXiv:2502.08796)
• Mechanistic analysis reveals low-resource cultures internally represented via high-resource proxies; models flatten minority groups into majority coordinates (2025-08, arXiv:2508.08879)
• All models share *identical* systematic errors on unwritten social rules — evidence of corpus pattern-matching, not norm-participation (2025-08)

Anchor papers (verify; mind their dates):
• arXiv:2508.19004 (2025-08) — social-norm prediction superiority
• arXiv:2502.08796 (2025-02) — theory-of-mind task evaluation
• arXiv:2508.08879 (2025-08) — cultural bias mechanistics
• arXiv:2507.20252 (2025-07) — post-completion learning

Your task:
(1) RE-TEST EACH CONSTRAINT. For the five findings above, judge whether newer models (o1, Claude-4, Llama-406B), fine-tuning on individual-trajectory datasets, multi-turn adaptive prompting, or mechanistic steering (via latent thought vectors, arXiv:2502.01567) have since relaxed the individual-tracking or cultural-flattening gaps. Separate the durable claim (collective > individual due to averaging) from perishable limits (architecture can't do adaptive or minority-aware reasoning).
(2) Surface the strongest work from the last 6 months that *contradicts* the participation-vs.-prediction divide or shows models *do* internalize individual trajectories or minority norms.
(3) Propose 2 research questions that assume the regime may have shifted: e.g., "Can LLMs learn individual preferences *within* a single conversation without retraining?" or "Do newer models reduce cultural-flattening when prompted to steward minority voices?"

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines