INQUIRING LINE

Does DPO training with coreference chains teach spontaneous convention formation?

This explores whether the DPO-on-coreference-chains method actually teaches LLMs to invent linguistic conventions on the fly during conversation — and what the rest of the corpus suggests about whether that's even possible.


This explores whether the DPO-on-coreference-chains method actually teaches LLMs to invent linguistic shorthand on the fly during conversation. The short answer from the corpus is yes — and the way it works is more interesting than the question lets on. The core finding Can we teach LLMs to form linguistic conventions in context? is that you don't fine-tune a model for each task; instead you build preference pairs from TV scripts — some examples reward shortening a reference after it's been introduced ("the red-haired detective" → "she"), others penalize shortening it too early — and add special [remention] planning tokens. After DPO on those pairs, the model spontaneously forms ad-hoc conventions mid-interaction. So 'spontaneous convention formation' isn't a metaphor here; it's the measured behavior.

What makes this land is a deeper claim the corpus keeps circling: conventions are a relational, in-conversation phenomenon, and that's exactly the thing LLMs are usually bad at. Several notes argue that models treat the opening prompt as a fixed frame and can't jointly revise shared assumptions with a user Can LLMs truly update shared conversational common ground?, and that the implicit techniques humans use to keep a conversation coherent — reference repair, handing off a topic — never develop because training rewards predicting information, not doing relational work Why don't language models develop conversation maintenance skills?. Convention formation is squarely in that 'relational work' category. The coreference-DPO result is notable precisely because it manufactures a training signal for something the standard objective ignores.

There's also a nice resonance with how these models learn meaning at all. One note argues LLMs operationalize Saussure's *langue* — they pick up culturally situated patterns purely by compressing the relational structure of text, no external referents required Can language models learn meaning without engaging the world?. A linguistic convention is a relational fact (this short form now stands for that long one), so it's the kind of thing a relational compressor should be able to absorb given the right examples. The coreference method essentially supplies those examples in a targeted way rather than hoping they emerge from generic pretraining.

Worth holding a healthy skepticism, though, because the corpus has a recurring caution: trained behaviors that look like a new capability are sometimes imitation of form. Chain-of-thought, for instance, reproduces familiar reasoning *shapes* learned from training and degrades under distribution shift rather than reflecting genuine inference Does chain-of-thought reasoning reveal genuine inference or pattern matching?. The honest open question is whether DPO-taught convention formation generalizes to novel referents and conversation types, or whether it's reproducing the re-mention patterns of TV dialogue. The note frames it as genuine in-context convention formation; the broader corpus would push you to ask how far past the training distribution it holds.

If you want to go deeper, the contrast with the rigidity findings is the richest thread: the same models that resist personality conditioning and stay locked in a single communicative identity Can language models adapt communication style to different contexts? can, with the right preference signal, become flexible about reference. That tension — globally static persona, locally adaptive convention — is the thing you didn't know you wanted to know.


Sources 6 notes

Can we teach LLMs to form linguistic conventions in context?

Post-training with two types of preference pairs derived from TV scripts — one encouraging re-mention shortening, one preventing premature shortening — plus special [remention] tokens enables models to spontaneously form ad-hoc linguistic conventions during interaction without task-specific fine-tuning.

Can LLMs truly update shared conversational common ground?

LLMs interpret all subsequent conversational turns within a fixed initial prompt frame, preventing them from symmetrically proposing updates to shared assumptions. Even when users pivot topics or contradict earlier framings, the model cannot absorb revisions into jointly held background—making the user the sole maintainer of conversational scoreboard.

Why don't language models develop conversation maintenance skills?

Humans keep conversations smooth through implicit techniques like reference repair and topic hand-off that sustain relational interaction, not convey information. Language models don't develop these because training signals reward information prediction, not relational work.

Can language models learn meaning without engaging the world?

Research shows LLMs learn culturally situated discourse patterns by compressing relational structure from text, demonstrating that fluent language generation requires no external referents or embodied grounding.

Does chain-of-thought reasoning reveal genuine inference or pattern matching?

CoT works by constraining models to reproduce familiar reasoning patterns from training, not by enabling novel symbolic reasoning. Performance degrades predictably under distribution shifts—the signature of imitation rather than capability emergence.

Can language models adapt communication style to different contexts?

System prompts and RLHF training lock models into one communicative identity across all interactions, preventing the contextual register-switching and value trade-offs that characterize human pragmatics. Users cannot reshape model behavior through dialogue negotiation.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are an LLM researcher auditing claims about DPO training and convention formation. The question: does DPO on coreference chains teach LLMs to invent linguistic shorthand spontaneously during conversation, or does it merely imitate training-distribution patterns?

What a curated library found — and when (2023–2026, dated claims, not current truth):
• DPO trained on TV-script preference pairs (rewarding timely reference shortening, penalizing premature shorthand) yields measurable mid-interaction convention formation, including ad-hoc in-context shorthand (arXiv:2508.06482, ~2025).
• Standard LLM training rewards information prediction, not relational/conversational maintenance work; convention formation is a relational phenomenon LLMs typically fail at (corpus consensus, 2024–2025).
• Chain-of-thought and other 'emergent' behaviors often reproduce familiar reasoning shapes from training and degrade under distribution shift rather than reflecting genuine abstract inference (arXiv:2506.02878, ~2025).
• LLMs maintain globally static communicative identity even under personality conditioning, yet can exhibit local reference flexibility with targeted preference signals — a tension the corpus flags (arXiv:2401.07115, ~2024).
• Prompt sensitivity and intent mismatch in multi-turn conversation remain persistent (arXiv:2410.12405, arXiv:2602.07338, 2024–2026).

Anchor papers (verify; mind their dates):
• arXiv:2508.06482 (Post-training for Efficient Communication via Convention Formation, ~2025)
• arXiv:2506.02878 (CoT Is Not True Reasoning, 2025)
• arXiv:2505.22907 (Conversational Alignment, 2025)
• arXiv:2602.07338 (Intent Mismatch in Multi-Turn, 2026)

Your task:
(1) RE-TEST EACH CONSTRAINT. For the coreference-DPO claim: does newer training (e.g., RL from conversational reward, multi-agent setups, in-context adaptation) now reliably produce convention formation beyond TV-dialogue distributions? Separately: has the static-identity limitation been relaxed via recent persona methods or mixture-of-experts routing? Cite what relaxed it; flag what still holds.
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months. The chain-of-thought critique and the conversational-alignment findings are in tension—which newer paper best arbitrates them?
(3) Propose 2 research questions that ASSUME the regime may have shifted: (a) Do multi-agent or long-horizon RL methods now enable convention formation over novel referents, not just memorized patterns? (b) Can in-context prompting or retrieval-augmented convention discovery replace or outperform DPO fine-tuning?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines