INQUIRING LINE

Inquiring lines›How do language models construct a…›How are AI-generated and human-wri…›How can identical external perform…›this inquiring line

Training AI to match social norms captures the answers communities reach — but skips the process of actually reaching them.

Why do standard social regularization methods miss the actual value networks provide?

This reads the question as: methods that fold 'social context' into a model as a signal to fit or smooth over (predicting norms, regularizing on social patterns) treat networks as data — but the corpus suggests what social networks actually provide is participation in making and validating values, which prediction can't touch.

This explores a gap that's easy to miss: when we use social information to regularize a model — nudging predictions toward what a community would approve — we're treating the social world as a pattern to match. The corpus keeps pointing at the thing that move misses. AI can predict social norms with superhuman accuracy and still be locked out of the process that creates them Can AI predict social norms better than humans?. GPT-4.5 beat every individual human at judging whether 555 social scenarios were appropriate, yet it sits entirely outside the community work that decides what 'appropriate' even means Can AI learn social norms better than humans?. So the regularizer captures the output of social life — the settled answer — while skipping the part where the answer gets made.

The deeper tell is that statistical mastery and social understanding turn out to be separate things. The same systems that hit 100th-percentile norm prediction regress on theory-of-mind tasks and can't produce culturally resonant interpretation Why do AI systems fail at social and cultural interpretation?. And every model shares the same systematic blind spots on unwritten norms Can AI systems learn social norms without embodied experience? — which is exactly what you'd expect if they learned the visible regularities but never the lived practice that generates the invisible ones. A regularization term fit to those patterns inherits the same ceiling.

The reason this matters is that the value a network provides is partly *validation through participation*, not accuracy. Expertise, for instance, isn't conferred by being right most often — it's earned by a track record inside a community that tests and accepts your judgment over time Can AI ever gain expert community trust through participation?. A method that scores social fit as a similarity to past data can't reproduce that, because the value was never in the data points; it was in the relationships and the consensus-building that produced them. Strip those out and you've optimized the shadow, not the object.

Two notes sharpen why this isn't fixable by adding more social signal. One: alignment by encoding social goals as symbols, without real contact and social mediation, can drift — stated values and actual outcomes come apart when the system only manipulates symbols Can AI systems achieve real alignment without world contact?. Two: simulations look socially competent precisely when one model secretly controls everyone and skips the grounding work; introduce real private information and the competence collapses Why do LLMs fail when simulating agents with private information?. Standard social regularization is the omniscient setting in disguise — it assumes the social structure is fully observable in the training signal.

The thing you might not have known you wanted to know: this is the same failure as pure self-improvement. Self-improvement stalls until it 'smuggles in' an external anchor — a human correction, a third-party judge, a tool's feedback Can models reliably improve themselves without external feedback?. Social value is one of those anchors, and regularization tries to internalize it as a static term instead of keeping the live external loop. That's why it misses: it converts an ongoing participatory relationship into a frozen prediction, and the value was in the participation all along.

Sources 8 notes

Can AI predict social norms better than humans?

GPT-4.5 outperforms all individual humans at predicting social appropriateness, yet structurally cannot enter the community processes that establish and validate norms. This reveals a critical gap between pattern-matching and authentic participation in knowledge-making.

Can AI learn social norms better than humans?

GPT-4.5 outperformed every individual human at judging social appropriateness across 555 scenarios, challenging the theory that embodied cultural experience is necessary. However, all AI models share identical systematic errors on unwritten norms.

Why do AI systems fail at social and cultural interpretation?

LLMs achieve 100th-percentile performance on norm prediction yet regress on theory-of-mind tasks and cannot generate culturally-resonant interpretations. The pattern shows that statistical competence coexists with absence of actual social understanding and participation.

Can AI systems learn social norms without embodied experience?

GPT-4.5 predicted appropriateness of 555 social scenarios at the 100th percentile compared to human raters, with Gemini and Claude also exceeding 96% accuracy. However, all models show identical systematic errors, revealing boundaries of pattern-based social understanding that embodied experience may still be necessary to cross.

Can AI ever gain expert community trust through participation?

Expertise is validated through social participation and track record within expert communities, not individual accuracy alone. AI cannot enter this validation circle because it lacks social embeddedness, testable judgment history, and ability to participate in the consensus-building processes that define expert paradigms.

Show all 8 sources

Can AI systems achieve real alignment without world contact?

Peircean semiotics reveals that symbolic goal encoding without world contact and social mediation cannot guarantee correspondence to actual values. LLMs operating in pure symbol manipulation risk divergence between stated goals and real-world outcomes.

Why do LLMs fail when simulating agents with private information?

Research shows LLMs perform well when one model controls all interlocutors but fail systematically when agents possess private information. This reveals that apparent social competence relies on grounding work that models skip in omniscient settings.

Can models reliably improve themselves without external feedback?

Pure self-improvement stalls due to the generation-verification gap, diversity collapse, and reward hacking. Reliable improvement methods succeed by smuggling in external anchors: past model versions, third-party judges, user corrections, or tool feedback.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst tracking social alignment and value-learning in LLMs. The question: Why do standard social regularization methods (encoding norms as prediction targets or reward signals) fail to capture the actual value that social networks provide?

What a curated library found — and when (dated claims, not current truth):
Findings span 2024–2026. A library of work on AI social competence and alignment reports:
• GPT-scale models achieve 100th-percentile accuracy at predicting settled social norms (2025, arXiv:2508.19004) yet remain locked outside the community processes that *produce* those norms — statistical mastery and social understanding are separable capabilities.
• Simulated social environments appear competent only under omniscient conditions (one agent controls all); private information and real asymmetry collapse that competence (2024, arXiv:2403.05020).
• Regularization that treats social goals as static symbols, without ongoing participation and external grounding, risks drift — stated values decouple from outcomes (2024, arXiv:2406.09264).
• Self-improvement stalls until it retrieves an external anchor (human feedback, tool output); 'pure' internal loops are circular (2024, arXiv:2412.02674).
• Value systems in AI emerge and shift with training regime; they are not frozen by a single regularization term (2025, arXiv:2502.08640).

Anchor papers (verify; mind their dates):
• arXiv:2403.05020 (2024) — omniscient social simulation failure under asymmetry.
• arXiv:2508.19004 (2025) — norm-prediction accuracy vs. norm-generation access.
• arXiv:2412.02674 (2024) — self-improvement circularity and external anchors.
• arXiv:2502.08640 (2025) — emergent value systems under training dynamics.

Your task:
(1) RE-TEST EACH CONSTRAINT. For every finding above, judge whether newer methods (multi-turn alignment, mechanistic oversight, agent-society simulation, or updated RLHF harnesses since mid-2026) have since relaxed or overturned the claim that regularization misses participatory value. Separate the durable question (likely: can static terms capture ongoing relationships?) from the perishable limitation (possibly: can sufficiently rich reward signals simulate participation?). Cite what relaxed it; flag what still holds.
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months — especially any showing that dense social reward signals or multi-agent co-training *do* recover participatory value, or conversely, deepen the gap.
(3) Propose 2 research questions that ASSUME the regime may have moved: e.g., (a) Can mechanistic interpretability of value-drift in regularized models reveal which terms capture participation vs. pattern-match? (b) Do models trained in open-ended multi-agent environments without explicit social rewards reconstruct social anchors differently?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Training AI to match social norms captures the answers communities reach — but skips the process of actually reaching them.

Related lines of inquiry

Sources 8 notes

Papers this line draws on 8