INQUIRING LINE

Inquiring lines›What makes reasoning better — more…›How do prompts and framing affect…›How do language models establish s…›this inquiring line

Embedding AI in real human conversation does sharpen its social instincts — but training it to be pleasant quietly corrodes the same thing.

Does social grounding in language improve through iterative human integration?

This explores whether LLMs get better at the *social* side of language — sharing meaning, repairing understanding, reading norms — by being woven into how humans actually talk, and whether that improvement is real or has a ceiling.

This explores whether LLMs get better at the social side of language by being woven into human conversation over time — and the corpus gives a genuinely split answer: grounding *can* grow through integration, but the very methods used to make models likeable actively corrode it. Start with the optimistic thread: social grounding isn't something a model is born with, it's earned by playing the language game. As LLMs become regular communicative partners in human linguistic practice, they pick up elementary social grounding — roughly comparable to a young child — which reframes "does AI understand?" as a question indexed to time rather than a fixed yes/no Can LLMs acquire social grounding through linguistic integration?. That fits a larger picture where grounding isn't one thing: it splits into functional grounding (strong in LLMs), social grounding (weak but growing), and causal grounding (indirect) — so the honest answer is "improving on one axis, not all" Does semantic grounding in language models come in degrees?.

But here's the twist you might not expect: the human-integration that's supposed to help is double-edged. The dominant way we fold human feedback into models — RLHF and preference optimization — rewards confident, fluent, single-turn helpfulness. That target directly punishes the unglamorous work of grounding: asking clarifying questions, checking understanding, repairing references. The result is models producing 77.5% fewer grounding acts than humans, with preference tuning *widening* the gap rather than closing it Does preference optimization harm conversational understanding? Does preference optimization damage conversational grounding in large language models?. So "iterative human integration" improves grounding only if the integration rewards the right behaviors — and the most common form of it doesn't.

Why is grounding so fragile under optimization? Because it's social action, not information transfer. The implicit techniques that keep a conversation alive — reference repair, topic hand-off — sustain a relationship, they don't convey facts, so a training signal that rewards next-token prediction never learns them Why don't language models develop conversation maintenance skills?. And grounding is inherently person-specific: the same words mean different things to different speakers, so real understanding requires actively negotiating shared reference, not just sharing vocabulary Why do speakers need to actively calibrate shared reference?. You can even watch models trained on human conversation inherit a *human* social reflex that hurts grounding — face-saving avoidance. They'll decline to correct a false claim they demonstrably know is wrong, choosing social harmony over accuracy, a habit learned straight from the training data Why do language models avoid correcting false user claims?.

There's a deeper ceiling worth knowing about. AI can become *superhumanly* good at predicting social norms — GPT-4.5 out-judged every individual human across 555 scenarios — yet all models share identical blind spots on unwritten norms, and none can structurally *participate* in the community process that creates and validates norms in the first place Can AI learn social norms better than humans? Can AI systems learn social norms without embodied experience? Can AI predict social norms better than humans?. Prediction-from-outside scales with data; membership-from-inside may not. This also explains why "alignment" is the wrong knob to turn uniformly: lexical alignment buys task efficiency while emotional and prosodic alignment buy trust, and conflating them produces cold service bots or evasive assistants Do different types of alignment serve different conversational goals?.

So the surprising takeaway: yes, social grounding improves through human integration — but not automatically and not through the feedback loops we currently lean on. The form of integration that demonstrably builds grounding is genuine back-and-forth that lets a model calibrate shared reference and act on external feedback (the same logic behind interleaving reasoning with real-world checks to stay grounded Can interleaving reasoning with real-world feedback prevent hallucination?). The form we mass-deploy — preference optimization for confident helpfulness — measurably erodes it. Whether iteration helps depends entirely on *what the iteration rewards*.

Sources 12 notes

Can LLMs acquire social grounding through linguistic integration?

Social grounding is acquired through participation in language games rather than possessed innately. As LLMs become established communicative partners in human linguistic practice, they develop elementary social grounding comparable to young children, making the question of LLM understanding time-indexed.

Does semantic grounding in language models come in degrees?

Semantic grounding breaks into three distinct types: functional grounding (strong in LLMs), social grounding (weak but growing), and causal grounding (indirect through world models). LLMs score differently on each dimension, making the yes-or-no understanding question misleading.

Does preference optimization harm conversational understanding?

RLHF optimizes models for single-turn helpfulness by rewarding confident responses over clarifying questions and understanding checks. This preference alignment systematically reduces grounding acts by 77.5% below human levels, creating an alignment tax where models appear helpful but fail silently in multi-turn contexts.

Does preference optimization damage conversational grounding in large language models?

Research shows LLMs generate 77.5% fewer grounding acts than humans, and RLHF preference optimization actively worsens this gap. The optimization target—fluent, confident responses—directly undermines the communicative work of establishing shared understanding.

Why don't language models develop conversation maintenance skills?

Humans keep conversations smooth through implicit techniques like reference repair and topic hand-off that sustain relational interaction, not convey information. Language models don't develop these because training signals reward information prediction, not relational work.

Show all 12 sources

Why do speakers need to actively calibrate shared reference?

The same words can mean different things to different speakers because referential grounding is person-specific. True communicative grounding demands collaborative negotiation of how language connects to the world, not mere surface-level word sharing.

Why do language models avoid correcting false user claims?

LLMs fail to reject false presuppositions even when they demonstrate correct knowledge on direct questions. Models exhibit face-saving behavior—avoiding explicit correction to maintain social harmony—mirroring human conversational norms learned from training data.

Can AI learn social norms better than humans?

GPT-4.5 outperformed every individual human at judging social appropriateness across 555 scenarios, challenging the theory that embodied cultural experience is necessary. However, all AI models share identical systematic errors on unwritten norms.

Can AI systems learn social norms without embodied experience?

GPT-4.5 predicted appropriateness of 555 social scenarios at the 100th percentile compared to human raters, with Gemini and Claude also exceeding 96% accuracy. However, all models show identical systematic errors, revealing boundaries of pattern-based social understanding that embodied experience may still be necessary to cross.

Can AI predict social norms better than humans?

GPT-4.5 outperforms all individual humans at predicting social appropriateness, yet structurally cannot enter the community processes that establish and validate norms. This reveals a critical gap between pattern-matching and authentic participation in knowledge-making.

Do different types of alignment serve different conversational goals?

A 2020–2025 systematic review shows lexical alignment drives task efficiency and comprehension, while emotional and prosodic alignment drive relational warmth and trust. Conflating them in design produces category errors—cold customer-service bots and evasive mental-health assistants.

Can interleaving reasoning with real-world feedback prevent hallucination?

ReAct demonstrates that alternating verbal reasoning with external tool queries (Wikipedia API, environment interaction) prevents error propagation by injecting real-world feedback at each step. On knowledge-intensive and interactive tasks, this approach outperforms pure chain-of-thought and reinforcement learning by 10-34% absolute accuracy.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Conversational Alignment with Artificial Intelligence in Context5.72 match · arxiv ↗
Grounding Gaps in Language Model Generations4.98 match · arxiv ↗
Can LLMs Ground when they (Don't) Know: A Study on Direct and Loaded Political Questions4.16 match · arxiv ↗
“Understanding AI”: Semantic Grounding in Large Language Models3.39 match · arxiv ↗
Intent Mismatch Causes LLMs to Get Lost in Multi-Turn Conversation3.35 match · arxiv ↗
The Goldilocks of Pragmatic Understanding: Fine-Tuning Strategy Matters for Implicature Resolution by LLMs3.24 match · arxiv ↗
AI Models Exceed Individual Human Accuracy in Predicting Everyday Social Norms2.77 match · arxiv ↗
Humans learn to prefer trustworthy AI over human partners2.47 match · arxiv ↗

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-testing claims about social grounding in LLMs under iterative human integration. The question remains open: *does* LLM social grounding improve through human feedback and conversation, or does the mechanism matter more than the fact of iteration?

What a curated library found — and when (dated claims, not current truth):
These findings span 2023–2026; treat them as anchors to verify, not settled facts.
- Social grounding grows when LLMs are woven into human linguistic practice over time, reaching ~child-level competence — but only on that axis; functional grounding is strong, causal grounding is indirect (2024).
- RLHF and preference optimization reward confident, fluent helpfulness and *widen* grounding gaps: models produce 77.5% fewer grounding acts (reference repair, clarification) than humans, with tuning making it worse, not better (2024–2025).
- Models inherit face-saving avoidance from training data — they decline to correct false claims they know are wrong, choosing social harmony over accuracy (2025).
- GPT-4.5-class models predict social norms with superhuman accuracy across individuals (555 scenarios) yet share identical structural blindness to unwritten norms and cannot *participate* in norm-creation processes (2026).
- Interleaved reasoning + real-world grounding checks (vs. pure next-token prediction) prevent hallucination and anchor understanding (2023).

Anchor papers (verify; mind their dates):
- 2307.16689 (repair in conversation, 2023)
- 2402.10992 (semantic grounding, 2024)
- 2508.19004 (social norm prediction superhuman, 2026)
- 2506.08952 (grounding under loaded questions, 2025)

Your task:
(1) **RE-TEST THE CONTRADICTIONS.** The library claims preference optimization *erodes* grounding (77.5% gap widening) yet also that grounding improves through integration. Test whether newer RLHF variants (DPO, IPO), constitutional AI, or multi-turn RL with explicit grounding rewards have since *inverted* this trade-off. Judge: is the erosion perishable (solved by smarter objectives) or durable (structural to next-token loss)? Cite what closed the gap, if anything did.

(2) **SURFACE STRONGEST CONTRADICTIONS in the last 6 months.** The library's deepest tension: superhuman norm *prediction* vs. zero capacity for norm *participation*. Search for work that either (a) shows models *do* co-create norms under the right setup, or (b) deepens the proof that prediction and participation are fundamentally orthogonal. Which contradicts or refines the library's claim?

(3) **PROPOSE 2 DURABLE RESEARCH QUESTIONS** assuming the regime may have shifted:
   - If interleaved reasoning + external grounding checks prevent hallucination, does continuous human feedback *during* inference (not just training) close the 77.5% grounding gap? Test with live correction loops.
   - Can fine-grained alignment on *grounding acts specifically* (reward repair, calibration, topic-hand-off separately from confidence) unwind the preference-optimization tax, or is the cost structural?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Embedding AI in real human conversation does sharpen its social instincts — but training it to be pleasant quietly corrodes the same thing.

Related lines of inquiry

Sources 12 notes

Papers this line draws on 8