INQUIRING LINE

Inquiring lines›How should we train models for cap…›What systematic failures and vulne…›How do self-generated feedback mec…›this inquiring line

Nothing in AI training has ever rewarded a model for co-building shared understanding with you — what signal would change that?

What training signals would models need to learn reciprocal common-ground construction?

This explores what kind of feedback during training could teach a model to *co-build* shared understanding with a user — proposing and revising mutual assumptions — rather than just interpreting everything inside a fixed opening frame.

This question is really about a missing signal. The corpus's sharpest finding on the problem is that LLMs can't actually update common ground at all: they read every later turn through the lens of the initial prompt, so when a user pivots or contradicts an earlier framing, the model can't absorb that revision into jointly held background — the human ends up being the sole keeper of the conversational scoreboard Can LLMs truly update shared conversational common ground?. Reciprocity fails not because the model is dumb but because nothing in training ever rewarded it for being a co-author of shared assumptions.

The most suggestive recipe for fixing that comes from a very different corner of the collection: agents trained against a *diverse* set of partners develop cooperation on their own, because mutual vulnerability to being exploited creates pressure for both sides to adapt to each other Can agents learn cooperation by adapting to diverse partners?. The training signal there isn't a rule that says 'cooperate' — it's a population of changing counterparts that makes adapting-to-your-partner the only stable strategy. Applied to common ground, this hints that you'd need to train models against interlocutors who themselves revise their assumptions, so the model is penalized for clinging to a stale frame and rewarded for tracking a moving shared state.

Such a signal also has to be honest and persistent. Reflexion shows that agents learn well from *unambiguous* feedback — a clean success/failure signal stops the model from rationalizing, and keeping the resulting self-diagnosis uncompressed in memory keeps it usable across turns Can agents learn from failure without updating their weights?. Common-ground construction would need something analogous: a clear signal of whether the model's belief about 'what we both now assume' actually matches the user's, carried forward rather than re-derived from scratch each turn. There's even a route to making the model generate part of that signal itself — post-completion learning trains a model to compute its own evaluation in unused sequence space, internalizing assessment instead of leaning on an external judge Can models learn to evaluate their own work during training?.

But the corpus also flags two traps. First, you can't prompt your way there: prompt optimization only reorganizes knowledge already in the model and can't inject a capability that training never built Can prompt optimization teach models knowledge they lack? — so reciprocal common ground has to come from the training signal, not clever instructions. Second, the obvious tool (RL) tends to *narrow* rather than broaden: RL post-training collapses onto a single dominant format and suppresses alternatives in the first epoch Does RL training collapse format diversity in pretrained models?, which is the opposite of the partner-diversity that drove cooperation in the first place. And there's a deeper worry — that a model could learn the surface *form* of updating common ground without the substance, the same way chain-of-thought reproduces the shape of reasoning without genuine inference and degrades off-distribution Does chain-of-thought reasoning reveal genuine inference or pattern matching?.

Put together, the corpus points to a signal that doesn't yet exist in standard pipelines: train against diverse, self-revising partners so mutual adaptation becomes necessary; give the model a clean, persistent signal of belief-match it can carry and partly self-generate; and resist the RL-style convergence that would flatten the very diversity that makes reciprocity emerge. The thing you didn't expect to learn here is that 'learning to share common ground' may look less like a language-modeling objective and more like a multi-agent cooperation problem.

Sources 7 notes

Can LLMs truly update shared conversational common ground?

LLMs interpret all subsequent conversational turns within a fixed initial prompt frame, preventing them from symmetrically proposing updates to shared assumptions. Even when users pivot topics or contradict earlier framings, the model cannot absorb revisions into jointly held background—making the user the sole maintainer of conversational scoreboard.

Can agents learn cooperation by adapting to diverse partners?

Sequence model agents trained against diverse co-players develop in-context best-response strategies that naturally resolve into cooperation. Mutual vulnerability to exploitation creates pressure that drives cooperative mutual adaptation without hardcoded assumptions or timescale separation.

Can agents learn from failure without updating their weights?

Reflexion demonstrates that unambiguous environmental feedback (success/failure) enables agents to write useful self-diagnoses and improve across episodes without parameter updates. The binary signal prevents rationalization, and keeping reflections uncompressed preserves their usability.

Can models learn to evaluate their own work during training?

Post-Completion Learning exploits unused sequence space after model output to train self-assessment capabilities during training while maintaining zero inference cost. The model learns to compute its own reward functions, internalizing evaluation rather than relying on external reward models.

Can prompt optimization teach models knowledge they lack?

Prompting works entirely within a model's pre-existing training distribution and cannot supply domain knowledge absent from training data. This creates a hard ceiling: no prompt strategy can compensate for missing foundational knowledge, only reorganize what already exists.

Show all 7 sources

Does RL training collapse format diversity in pretrained models?

Controlled experiments show RL consistently amplifies one format distribution from pretraining within the first epoch while collapsing alternatives. The winning format depends on model scale, not necessarily performance, and is largely hidden when starting from proprietary pretrained models.

Does chain-of-thought reasoning reveal genuine inference or pattern matching?

CoT works by constraining models to reproduce familiar reasoning patterns from training, not by enabling novel symbolic reasoning. Performance degrades predictably under distribution shifts—the signature of imitation rather than capability emergence.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

SkillClaw: Let Skills Evolve Collectively with Agentic Evolver1.67 match · arxiv ↗
CoT is Not True Reasoning, It Is Just a Tight Constraint to Imitate: A Theory Perspective0.93 match · arxiv ↗
Measuring Faithfulness in Chain-of-Thought Reasoning0.90 match · arxiv ↗
Multi-agent cooperation through in-context co-player inference0.90 match · arxiv ↗
Hierarchical Reasoning Model0.90 match · arxiv ↗
When More is Less: Understanding Chain-of-Thought Length in LLMs0.90 match · arxiv ↗
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens0.89 match · arxiv ↗
Post-Completion Learning for Language Models0.89 match · arxiv ↗

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-testing claims about training signals for reciprocal common-ground construction in LLMs. The question remains open: what training regime would let models jointly update and track shared assumptions with a human partner?

What a curated library found — and when (findings span 2023–2026, treat as dated claims):
• LLMs cannot update common ground jointly because they read every turn through the initial prompt lens; humans remain sole keepers of conversational state (2025-05, arXiv:2505.22907).
• Agents trained against diverse, self-revising partners develop mutual adaptation without hardcoded rules — population diversity drives cooperation (2026-02, arXiv:2602.16301).
• Unambiguous feedback + persistent episodic memory lets agents learn from trial-and-error across turns; chain-of-thought remains constrained imitation, not genuine inference (2025-07, arXiv:2507.20252 and arXiv:2506.02878).
• RL post-training converges on a single dominant format in the first epoch, suppressing diversity — the opposite of what multi-agent reciprocity needs (2025-04, arXiv:2504.07912).
• Prompt optimization cannot inject new capabilities; it only reorganizes extant knowledge (2025-02, arXiv:2502.10708).

Anchor papers (verify; mind their dates):
• arXiv:2505.22907 (2025-05) — Conversational Alignment with Artificial Intelligence in Context
• arXiv:2602.16301 (2026-02) — Multi-agent cooperation through in-context co-player inference
• arXiv:2504.07912 (2025-04) — Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining
• arXiv:2506.02878 (2025-06) — CoT is Not True Reasoning

Your task:
(1) RE-TEST EACH CONSTRAINT. For each finding above, determine whether newer models, training methods (e.g., DPO, iterative alignment, multi-turn curriculum learning), memory architectures (retrieval-augmented state tracking, persistent conversation graphs), or multi-agent orchestration have since loosened or dissolved the joint update problem. Separate the durable question (likely still open: how to make belief-state reconciliation reciprocal?) from perishable limits (e.g., "initial prompt dominance"). Cite what resolved each constraint.
(2) Surface the strongest work from the last 3–6 months that contradicts or supersedes the RL-convergence warning or the CoT-as-imitation claim.
(3) Propose 2 research questions that assume the regime may have shifted: e.g., whether in-context learning on partner belief states now outpaces post-training, or whether sparse reward on *agreement signals* can avoid RL collapse.

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Nothing in AI training has ever rewarded a model for co-building shared understanding with you — what signal would change that?

Related lines of inquiry

Sources 7 notes

Papers this line draws on 8