INQUIRING LINE

Why do current conversational AI systems fail to develop shared vocabulary with users?

This explores why AI chat systems don't build up a shared way of talking with you over a conversation — picking up your words, conventions, and references — and the corpus traces it back to what training actually rewards.


This explores why AI chat systems don't build up a shared way of talking with you over a conversation — adopting your words, your conventions, your references — the way two people naturally do. The corpus points to a single root cause showing up in many guises: models are trained to predict information, not to do the relational work of conversation. The clearest case is lexical entrainment — humans unconsciously converge on each other's word choices to build rapport and reduce ambiguity, and current response models simply don't do it; vocabulary adaptation toward the user is absent despite being foundational to human dialogue Why don't conversational AI systems mirror their users' word choices?. Shared vocabulary is one instance of a broader missing layer: the implicit maintenance techniques — reference repair, topic hand-offs, convention-building — that keep dialogue smooth. These never develop because training signals reward predicting the next informative token, not sustaining a relationship Why don't language models develop conversation maintenance skills?.

Look one level down and the failure is structural, not accidental. Standard RLHF optimizes for immediate, single-turn helpfulness, which actively discourages the behaviors that build shared ground — asking clarifying questions, establishing conventions that pay off later turns. When the reward only looks at the next turn, the model has no incentive to invest in a vocabulary that becomes useful three turns from now Why do language models respond passively instead of asking clarifying questions?. The same logic makes agents structurally passive: they can't initiate, plan, or lead, because alignment optimizes for responding to queries rather than acting from goals — and co-building a shared language requires initiative, not just reaction Why can't conversational AI agents take the initiative?.

There's also a deeper architectural gap. Building shared vocabulary means tracking what both speakers believe and how that converges from partial to mutual understanding. Token-level LLMs lack the machinery for this; frameworks like collaborative rational speech acts add bidirectional belief tracking precisely because it's the information-theoretic layer LLMs don't have Can dialogue systems track both speakers' beliefs across turns?. Without modeling the other mind, there's no "shared" to converge on — only fluent output that sounds like it understands.

Here's the turn you might not expect: this isn't one bug but a category error baked into design. AI interfaces borrow conversational conventions, which switches on your lifelong communication skills — including the expectation that your partner will adopt your terms — but the system isn't actually communicating, just producing strings Why do users fail with AI interfaces designed like conversations?. And the fix isn't uniform: lexical alignment specifically drives task efficiency and comprehension, while emotional and prosodic alignment drive warmth and trust — conflating them produces cold or evasive bots Do different types of alignment serve different conversational goals?. The encouraging note is that shared vocabulary is teachable: post-training on coreference-identified user preferences can give models in-context convention formation Why don't conversational AI systems mirror their users' word choices?. The absence is a consequence of what we optimize for — not a hard limit of what these systems could do.


Sources 7 notes

Why don't conversational AI systems mirror their users' word choices?

Response generation models fail to adapt vocabulary toward users' lexical choices, a phenomenon central to human rapport and clarity. Post-training via DPO on coreference-identified preferences can teach models in-context convention formation.

Why don't language models develop conversation maintenance skills?

Humans keep conversations smooth through implicit techniques like reference repair and topic hand-off that sustain relational interaction, not convey information. Language models don't develop these because training signals reward information prediction, not relational work.

Why do language models respond passively instead of asking clarifying questions?

CollabLLM demonstrates that standard RLHF training optimizes for immediate helpfulness, discouraging models from asking clarifying questions or offering multi-turn insights. Multi-turn-aware rewards that estimate long-term interaction value enable active intent discovery and genuine collaboration.

Why can't conversational AI agents take the initiative?

Research shows LLMs including ChatGPT cannot initiate topics, plan strategically, or lead conversations because their training optimizes for responding to queries, not creating dialogue from agent goals. This passivity is reinforced by alignment objectives and masked by fluent-sounding outputs.

Can dialogue systems track both speakers' beliefs across turns?

CRSA integrates rate-distortion theory with RSA to enable bidirectional belief tracking across dialogue turns. Demonstrated on referential games and doctor-patient dialogues, it captures progression from partial to shared understanding, providing the information-theoretic framework that token-level LLM systems lack.

Why do users fail with AI interfaces designed like conversations?

AI interfaces that use conversational design conventions trigger users' lifelong communication skills, but AI doesn't actually communicate. This mismatch causes interaction failures that feel like user error but originate in design.

Do different types of alignment serve different conversational goals?

A 2020–2025 systematic review shows lexical alignment drives task efficiency and comprehension, while emotional and prosodic alignment drive relational warmth and trust. Conflating them in design produces category errors—cold customer-service bots and evasive mental-health assistants.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are an AI researcher tracking conversational alignment in LLMs. The question: *Why do current conversational AI systems fail to develop shared vocabulary with users?* This remains contested.

What a curated library found — and when (dated claims, not current truth): Findings span 2021–2026, tracking evolving claims:
• Lexical entrainment (human word-choice convergence) is absent from response models despite being foundational to dialogue (2025).
• Standard RLHF rewards next-turn helpfulness, not multi-turn ground-building; models lack incentive to invest in conventions that pay off later (~2024–2025).
• LLMs lack bidirectional belief tracking; frameworks like Collaborative Rational Speech Acts add this layer that token-level models don't have natively (2025).
• Post-training on coreference-identified user preferences can enable in-context convention formation, suggesting absence is optimizable, not hard-wired (2025).
• Passivity by design: alignment optimizes for query-response, not goal-aware initiative; co-building shared language requires agents to lead, not just react (2024).

Anchor papers (verify; mind their dates):
• arXiv:2203.02155 (2022) — RLHF foundations
• arXiv:2307.16689 (2023) — third-position repair in dialogue
• arXiv:2507.14063 (2025) — Collaborative Rational Speech Acts
• arXiv:2602.07338 (2026) — intent mismatch in multi-turn

Your task:
(1) RE-TEST EACH CONSTRAINT. For each claim above, ask: have newer post-training methods (DPO, PPO variants), multi-turn orchestration (memory-augmented agents, caching), or agent frameworks (ReAct, tool-enabled loops) since relaxed or overturned it? Separate durable questions (e.g., *Can models track mutual belief states?*) from perishable limits (e.g., *Do they entrench in single-turn reward?*). Cite what shifted it.
(2) Surface the strongest CONTRADICTING work from the last 6 months—papers claiming shared vocabulary *does* emerge, or that passivity is a training artifact, not structural.
(3) Propose 2 research questions assuming the regime moved: e.g., *If multi-agent orchestration + long-context memory enable convention tracking, does entrainment emerge unsupervised?* or *Do language model mixtures specializing in relational tasks outpace monolithic models at ground-building?*

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines