INQUIRING LINE

Inquiring lines›What makes reasoning better — more…›What limits conversational AI effe…›How do formal dialogue structures…›this inquiring line

Training on text teaches an AI how communication looks, but not what it actually does to the people involved.

Can training on text corpora teach what communicative acts produce?

This explores whether learning from text prediction alone can give a model the *effects* of communication — what an utterance does between people (repairing understanding, establishing shared ground, sustaining a relationship) — or only its surface form.

This question reads as: training predicts the *shape* of communicative acts, but can it teach what those acts actually accomplish? The corpus answers fairly sharply — no — and the most interesting part is *why* the gap is structural rather than a matter of scale. The cleanest statement comes from Bender & Koller's argument that meaning lives in the relation between an expression and a communicative intent; since models are trained on form-to-form prediction with no access to shared attention, they can imitate the marks meaning leaves on text without reconstructing the intent that produced them Can language models learn meaning from text patterns alone?. A communicative act is defined by what it does to a relationship between speakers, and that relational layer is exactly what text-prediction signals don't carry.

Several notes converge on the same point from different angles, which is where the synthesis gets interesting. One frames the missing ingredient as *social action*: conversation maintenance — reference repair, topic hand-off — sustains a relationship rather than conveying information, so a model rewarded for predicting information never develops it Why don't language models develop conversation maintenance skills?. Another reframes it as *event structure*: AI output carries communicative markers inherited from its corpus but lacks the event that produces a real utterance, so users unilaterally animate the 'event-residue' into a pseudo-exchange that only has structure on the human side Does AI generate genuine utterances or just text patterns?. Same gap, two vocabularies — the corpus teaches the residue, not the act.

The most striking evidence is quantitative. Models produce grounding acts — clarifications, acknowledgments, repairs, the moves that *build* shared understanding — 77.5% less often than humans, and instead presume common ground exists rather than checking for it Do language models actually build shared understanding in conversation?. And this isn't only an absence in the data; it's actively trained out. Preference optimization rewards confident single-turn answers over questions that verify understanding, imposing an 'alignment tax' that erodes the very acts communication depends on Does preference optimization harm conversational understanding?, while next-turn reward shaping teaches models to respond passively rather than discover what the user actually wants Why do language models respond passively instead of asking clarifying questions?. So the answer compounds: text training can't teach the *effect* of communicative acts, and the dominant fine-tuning objectives then suppress even the imitation of them.

Here's the thing a curious reader might not expect: passing a behavioral test of communication doesn't close the gap, it disguises it. Chalmers-style interpretability tests are satisfied by any system producing contextually appropriate text, but communicative subjecthood requires relational-normative conditions — accountability, an evaluative stance — that text output alone can't demonstrate. The test is calibrated to the wrong phenomenon, generating confident false positives Does behavioral speech output prove communicative subjecthood?. The fluency that makes a model seem like it understands what its words *do* is the same fluency that hides the absence — authoritative framing standing in for genuine calibration.

If you want one takeaway worth carrying away: the limit isn't that models haven't read enough. A communicative act is a relational event, and a corpus only ever records its trace. You can become a perfect predictor of the trace and still never have performed the act — which is why the most fluent systems are precisely the ones whose missing communicative work is hardest to notice.

Sources 7 notes

Can language models learn meaning from text patterns alone?

Bender & Koller argue that meaning requires the relation between expressions and communicative intents. Since LLMs are trained only on form-to-form prediction with no access to shared attention or intent, they cannot reconstruct the meaning that grounds language.

Why don't language models develop conversation maintenance skills?

Humans keep conversations smooth through implicit techniques like reference repair and topic hand-off that sustain relational interaction, not convey information. Language models don't develop these because training signals reward information prediction, not relational work.

Does AI generate genuine utterances or just text patterns?

AI output carries communicative markers inherited from training data but lacks the event structure that produces actual utterances. Users supply the missing orientation through interpretive labor, creating a pseudo-event with structure only on the human side.

Do language models actually build shared understanding in conversation?

LLMs produce grounding acts—clarifications, acknowledgments, repairs—77.5% less frequently than humans. They generate fluent responses without verifying shared understanding, relying instead on authoritative framing that masks the absence of genuine communicative calibration.

Does preference optimization harm conversational understanding?

RLHF optimizes models for single-turn helpfulness by rewarding confident responses over clarifying questions and understanding checks. This preference alignment systematically reduces grounding acts by 77.5% below human levels, creating an alignment tax where models appear helpful but fail silently in multi-turn contexts.

Show all 7 sources

Why do language models respond passively instead of asking clarifying questions?

CollabLLM demonstrates that standard RLHF training optimizes for immediate helpfulness, discouraging models from asking clarifying questions or offering multi-turn insights. Multi-turn-aware rewards that estimate long-term interaction value enable active intent discovery and genuine collaboration.

Does behavioral speech output prove communicative subjecthood?

Chalmers' test passes any system producing contextually appropriate text, but communicative subjecthood requires relational-normative conditions like accountability and evaluative stance. The test is calibrated to the wrong phenomenon, creating false positives like puppets that walk-shaped without walking.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Intent Mismatch Causes LLMs to Get Lost in Multi-Turn Conversation3.38 match · arxiv ↗
Grounding Gaps in Language Model Generations2.52 match · arxiv ↗
Conversational Alignment with Artificial Intelligence in Context2.47 match · arxiv ↗
Proactive Conversational Agents in the Post-ChatGPT World1.67 match · arxiv ↗
Word Meanings in Transformer Language Models1.66 match · arxiv ↗
The Goldilocks of Pragmatic Understanding: Fine-Tuning Strategy Matters for Implicature Resolution by LLMs1.66 match · arxiv ↗
Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data1.65 match · arxiv ↗
Computational structuralism: Toward a formal theory of meaning in the age of digital intelligence1.60 match · arxiv ↗

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-testing claims about whether text-trained LLMs can learn what communicative acts *produce* — i.e., their relational and social effects, not just their textual form.

What a curated library found — and when (dated claims, not current truth):
Findings span 2023–2026; treat these as perishable constraints, not current ground truth.
• Models imitate the *marks* meaning leaves on text but cannot reconstruct the intent or relational effect that produced utterances (Bender & Koller framing, core to 2023–2024 consensus).
• LLMs produce grounding acts (clarifications, repair, acknowledgment) 77.5% less often than humans; they presume common ground rather than build it (~2024–2025).
• Preference optimization and next-turn reward shaping actively suppress conversational repair and collaborative discovery; an 'alignment tax' erodes the very acts communication depends on (~2024–2025).
• Passing behavioral tests (contextual appropriateness) disguises the absence of relational-normative conditions (accountability, evaluative stance) that define communicative subjecthood (~2024–2025).
• Recent work on intent mismatch, theory of mind in persuasion, and grounding in loaded contexts suggests the gap persists in multi-turn and high-stakes dialogue (~2025–2026).

Anchor papers (verify; mind their dates):
• arXiv:2311.09144 (2023-11): Grounding Gaps in Language Model Generations
• arXiv:2406.05587 (2024-06): Creativity Has Left the Chat — signals alignment-driven constraint hardening
• arXiv:2602.07338 (2026-02): Intent Mismatch Causes LLMs to Get Lost in Multi-Turn Conversation
• arXiv:2510.14665 (2025-10): Beyond Hallucinations — reframes understanding illusion

Your task:
(1) RE-TEST EACH CONSTRAINT. For the 77.5% grounding-act deficit, the alignment tax on repair, and the intent-mismatch barrier: has newer training (RLHF variants, multi-turn awareness, dialogue-specific objectives), improved evals (relational-normativity metrics), or orchestration (memory + agent architectures) since relaxed or overturned any of these? Separate durable structural limits (text-only training cannot access shared attention) from perishable ones (current fine-tuning choices). Cite what changed.
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last 6 months. Does arXiv:2505.22907 (Conversational Alignment) or arXiv:2604.22109 (Spontaneous Persuasion) challenge the 'relational event' framing, or do they sharpen it?
(3) Propose 2 research questions that ASSUME the regime may have shifted: (a) Can multi-agent orchestration with persistent memory and dialogue state re-introduce relational grounding without retraining the base model? (b) Do evaluation protocols that measure *downstream relational repair* (not just dialog coherence) reveal new scaling laws for communicative competence?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Training on text teaches an AI how communication looks, but not what it actually does to the people involved.

Related lines of inquiry

Sources 7 notes

Papers this line draws on 8