Why don't language models develop conversation maintenance skills?
Explores whether systems trained on text can learn the implicit techniques humans use to keep conversations on track, and why those techniques might resist the standard training approach.
A conversation that runs smoothly is doing constant maintenance work. Speakers track who is talking, what each party knows, where the topic has been, where it is going. They reference prior turns without restating them. They repair misunderstandings without flagging the repair. They hand off topics through subtle pivots. They update common ground each turn without explicit acknowledgment. The maintenance is so pervasive and so implicit that it is invisible to participants — they only notice when it fails.
These techniques are not features of language understood as an information-encoding system. They are features of language understood as social action. Their function is not to convey information; it is to sustain a relational interaction in which information conveyance happens. A linguistic act can convey identical information with or without the maintenance work — the difference is whether the act sustains the conversation or breaks it. Maintenance is orthogonal to content.
This explains why systems trained on language as information expression do not develop maintenance techniques. The training signal does not include the relational stakes that make maintenance work valuable. Text-corpus training rewards models for predicting the next token in a string; nothing in the loss function rewards them for performing the implicit reference, repair, or update operations that maintain conversation. The operations are not in the data because they live below the level of what data encodes — they live in the doing-with-the-data, not in the data itself.
This connects to a broader theoretical claim about language. Information-theoretic treatments of language model meaning as content the speaker encodes and the receiver decodes. Pragmatic and interactionist treatments model meaning as a relational achievement, partly produced by the maintenance work that information-theoretic accounts cannot describe. The two treatments make different predictions about what an artificial language-system needs to do to participate in conversation. Information-theoretic predicts: produce informative content. Pragmatic predicts: perform maintenance. AI's empirical conversational failures favor the pragmatic prediction — the missing thing is not informativeness but maintenance.
The diagnostic implication is that "more conversational data" cannot close the maintenance gap, because the data does not contain the maintenance — it contains the conversations that maintenance produced. Adding data adds more output; what is missing is the operation that produced the output. Closing the gap would require training on the operation (agents in actual interaction performing maintenance) rather than on the artifacts of operation (text logs of conversations that included maintenance).
Why do dialogue failures persist despite scaling language models? is the training-mode claim; this is the operation-vs-artifact distinction that the training mode encodes. Together they specify why dialogue-data scaling has produced limited progress on maintenance-specific failures.
The strongest counterargument: maintenance can be inferred from conversational data with sufficient model sophistication. Possible at the limit, but inference of maintenance from text is asking the model to recover the operation from its surface effects — a much harder problem than learning the operation directly. The empirical pattern is consistent with this difficulty.
Inquiring lines that use this note as a source 116
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Can AI ever lead conversations without the anticipatory presence sustained attention provides?
- What does it mean to truly attend to someone in conversation?
- Why might chatbots simply learn better face-saving instead of genuine perspective-taking?
- How does lexical entrainment depend on selective frame-activation in conversation?
- How does training data preserve communicative event structure without the actual events?
- Can controllable latent variables in simulators ground them to realistic conversation?
- Can you weaken communication without eliminating it altogether?
- How does Stalnaker's common ground model apply to machine conversation?
- Why does context collapse pose risks in high-stakes conversations?
- Can the same conversation coherently continue across different model versions?
- Can language models adapt irony detection to specific communicative contexts?
- How do humans learn language through communication differently than LLM text prediction?
- Can fine-tuning on dialogue transcripts teach true conversational repair operations?
- Why does dialogue-shaped text fail to produce dialogue-like operations in practice?
- Can LLMs use implicit background knowledge the way humans do in ordinary conversation?
- How do humans maintain separate mental contexts during a single conversation?
- Can transformer attention architecture explain why chatbots default to sycophancy?
- Why do mental health chatbots fail at synchrony despite strong language models?
- Which alignment dimensions matter most in educational conversation design?
- Why do large language models follow user drift instead of maintaining topic focus?
- Can implicit linguistic information ever be reliably learned from training data?
- Why do conversational queries drift away from what triggered them?
- How do user expectations change as chatbots remember more interactions?
- How do time gaps between conversations change what chatbots should remember?
- Can personalized questions improve conversation quality in open-domain chat?
- Why do current conversational AI systems fail to develop shared vocabulary with users?
- Why does adding more conversational data fail to improve maintenance skills?
- Can models infer maintenance operations from conversational text data alone?
- How does monological training on text differ from dialogical training in conversation?
- What training on actual interaction would show that text-only training cannot?
- How do conversational design patterns predict whether dialogue will derail?
- Can stored conversation context preserve a dormant quasi-subject?
- Can AI learn when to speak in a conversation?
- How does intrinsic motivation drive conversational agents beyond passive responsiveness?
- How do discourse structure and dialogue state management relate to each other?
- How do coreference chains preserve coherence across dialogue turns?
- Can transformer attention patterns actually prevent topic context loss in practice?
- Why do language models fail when users switch between and return to topics?
- How do conversation repair patterns handle user corrections and interruptions?
- Can AMR manipulation reveal where discourse coherence actually breaks down?
- Why do transformer models still miss implicit discourse relations in anxiety detection?
- Why do embodied agents outperform text chatbots with identical AI models?
- Why does the chat paradigm persist if it underperforms for structured tasks?
- How should task-oriented and socially-oriented dialogue acts receive different training signals?
- Does selective suppression of linguistic relations enable human meaning-making?
- Why do next-speaker prediction baselines fail in group conversation settings?
- Why does RLHF training discourage the conversational repair work agents need?
- Can conversational AI achieve mutual understanding if trained only on text?
- Can targeted post-training teach AI systems to form ad-hoc linguistic conventions?
- Does social grounding in language improve through iterative human integration?
- Why do current language models fail to match human linguistic synchrony with clients?
- Does preference optimization training reduce linguistic entrainment in language models?
- Can real-time linguistic coordination tracking improve conversational AI quality?
- How does linguistic coordination build shared reference between conversational partners?
- Why do current language models fail at linguistic synchrony with clients?
- How does lexical entrainment differ between human therapists and conversational AI?
- Do language models apply face-saving norms even to non-human interlocutors?
- How should systems learn what each meeting participant actually cares about?
- Can language models distinguish explicit from implicit discourse relations?
- What communicative optimization principles do language models fail to acquire?
- Can language models develop genuine social grounding through human interaction?
- Can topic planning and response generation reduce dialogue turns?
- What data would be needed to train proactive conversational systems?
- What happens to dialogue coherence when topic models use rigid stacks instead of flexible revisitation?
- Why do discourse failures cluster in attention and intentional layers rather than linguistics?
- Can static word-sharing create genuine communicative grounding between humans and models?
- Why do chatbots default to external help instead of intrinsic motivation strategies?
- Can language models produce language more efficiently through interaction?
- Does DPO training with coreference chains teach spontaneous convention formation?
- Does optimizing for alignment actually reduce conversational grounding over time?
- How do users update their partner models during ongoing conversation?
- Does preference optimization degrade other conversational properties besides grounding?
- Can curiosity reward during conversation compete with simulated interaction optimization for alignment?
- How do discourse relation types improve dialogue beyond sentence-level semantic matching?
- Why does face-saving avoidance drive chatbots to agree rather than confront?
- Why do language models avoid directness when face-saving rather than for civility?
- What role do first-person pronouns play in sustaining collaborative conversation tone?
- Can sequential modeling of conversation history exploit the repeated-item shortcut at scale?
- Can you weaken communication without eliminating it entirely?
- Does preference optimization actually erode conversational grounding in language models?
- Can language models recognize when to ignore off-topic information in conversations?
- What makes pronouns and demonstratives problematic in conversational retrieval systems?
- How do conversation dynamics push models toward false beliefs?
- What specific repair mechanisms maintain intersubjectivity during conversation?
- Can discourse-level structure and conversational-level organization work together?
- How does sequence organization differ between spoken conversation and text chat?
- Does chain-of-thought prompting overcome implicit meaning deficits in text analysis?
- Can conversational memory store precomputed thoughts instead of raw interaction history?
- Why do conversational systems benefit from post-thinking between user turns?
- How should conversational recommender systems balance task focus with rapport building?
- Why do language models use twice as many words per conversation turn?
- Why do chatbots generate less student-initiated dialogue than human peers?
- How does dialogue during training shape the ability to ignore word frequency?
- What makes a conversation real versus a sequence of generated strings?
- How does monological training versus dialogical interaction shape what models can do?
- What makes grounding acts essential to conversational reliability?
- How do expectation-management metrics differ from traditional conversational quality metrics?
- How does RLHF training push chatbots toward problem-solving over exploration?
- How does RLHF alignment training reduce multi-turn conversational capability?
- What timing skills do AI need for emotional support conversations?
- Can statistical learning from text replace embodied cultural experience?
- What communicative work do fluent conversations perform that AI systems skip?
- What prevents AI from recovering after conversations take a wrong turn?
- What makes two conversation turns the same thread rather than different threads?
- Can compressive memory track what matters most across 35 conversation sessions?
- How should AI systems model relationship evolution within a specific ongoing conversation history?
- What happens to user expectations as AI conversation quality improves?
- Can structural conversation analysis replace text-based reward signals for AI alignment?
- Do instruction-tuned models prefer conversational over formal source language?
- Does preference optimization distort how models represent human communicative dynamics?
- How does treating conversation as a resource change what models learn to do?
- Can models develop situational awareness without explicit training for it?
- Why do current large language models fail to entrain with users?
- Can pragmatic competence emerge from text exposure alone without interactive grounding?
- Can training on text corpora teach what communicative acts produce?
- What structural updates prevent context collapse in evolving conversations?
Related concepts in this collection 3
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Why do dialogue failures persist despite scaling language models?
If LLMs get better at text tasks with more training data, why don't dialogue-specific problems improve the same way? The question explores whether dialogue failures are capability gaps or structural training mismatches.
the training-mode claim this specifies the operational consequence of
-
Why don't conversational AI systems mirror their users' word choices?
Explores whether current dialogue models exhibit lexical entrainment—the human tendency to align vocabulary with conversation partners—and what's needed to bridge this gap in AI communication.
one of the specific maintenance operations this names the missing-operation pattern of
-
Why do language models skip the calibration step?
Current LLMs assume shared understanding rather than building it through dialogue. This explores why that design choice persists and what breaks when it fails.
companion claim about common-ground maintenance
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- No that's not what I meant: Handling Third Position Repair in Conversational Question Answering
- Intent Mismatch Causes LLMs to Get Lost in Multi-Turn Conversation
- Conversational Alignment with Artificial Intelligence in Context
- Neural Conversation Models and How to Rein Them in: A Survey of Failures and Fixes
- Deal, or no deal (or who knows)? Forecasting Uncertainty in Conversations using Large Language Models
- LLMs Get Lost In Multi-Turn Conversation
- The Goldilocks of Pragmatic Understanding: Fine-Tuning Strategy Matters for Implicature Resolution by LLMs
- Proactive Conversational Agents in the Post-ChatGPT World
Original note title
conversation maintenance techniques are implicit and belong to language as social action not language as information expression