INQUIRING LINE

Can structural conversation analysis replace text-based reward signals for AI alignment?

This explores whether the formal structures conversation analysts use to describe how human dialogue actually works—turn-taking, repair, clarification sequences—could stand in for the text-prediction reward signals (like RLHF) that currently shape AI behavior.


This explores whether the formal structures conversation analysts use to describe human dialogue—clarification sequences, repair, turn-taking—could replace the text-prediction reward signals like RLHF that currently shape AI. The honest answer from the corpus is: probably not a clean replacement, but the question exposes exactly what text-based rewards are blind to, and that's the more interesting story.

The corpus is unusually pointed about *why* text-based rewards fail. Standard RLHF optimizes for immediate, next-turn helpfulness, which quietly trains models to answer passively rather than ask, probe, or discover what the user actually wants Why do language models respond passively instead of asking clarifying questions?. The result is a structurally passive agent that can't initiate, plan, or lead Why can't conversational AI agents take the initiative?. And because the training signal rewards predicting information, models never develop the implicit relational moves—reference repair, topic hand-off—that keep human conversations smooth Why don't language models develop conversation maintenance skills?. So the case for bringing in structural analysis isn't aesthetic; it's that the text-reward objective is measuring the wrong thing.

Where conversation analysis earns its keep is as a *source of structure that text rewards can't see*. Insert-expansions—the clarifying sub-sequences humans use before answering—give a formal account of *when* an agent should stop and consult the user instead of silently chaining tools toward the wrong goal When should AI agents ask users instead of just searching?. Proactivity, offering relevant information unasked, mirrors Grice's conversational maxims and can cut dialogue turns by up to 60%, yet it's nearly absent from the datasets models train on Could proactive dialogue make conversations dramatically more efficient?. These are structural diagnoses of what good dialogue requires—the kind of thing a token-level loss will never surface on its own.

But 'replace' is the wrong verb, and the corpus keeps pointing to *richer reward formulations* rather than to abandoning learned signals. Multi-turn-aware rewards that estimate long-term interaction value already encode some of this structure back into RL Why do language models respond passively instead of asking clarifying questions?. Unified policy learning folds when-to-ask, what-to-recommend, and timing into one trajectory-level objective, beating separated components precisely because conversation is a structured whole Can unified policy learning improve conversational recommender systems?. And information-theoretic models like collaborative rational speech acts track *both* speakers' beliefs across turns—the bidirectional structure that token-level systems lack—offering a formal scaffold that could shape a reward rather than discard the idea of one Can dialogue systems track both speakers' beliefs across turns?. Note too that reward signals don't have to be human text labels at all: model confidence can serve as an intrinsic reward Can model confidence work as a reward signal for reasoning?, which suggests the real frontier is *what you reward*, not whether you reward.

The deepest doubt in the corpus cuts against both sides. One line of argument holds that alignment can't be guaranteed by manipulating symbols at all—without indexical grounding and social mediation, a model's stated goals can drift from real values no matter how the signal is shaped Can AI systems achieve real alignment without world contact?. A companion note argues AI doesn't even produce genuine utterances, only 'event-residue' that humans animate into a pseudo-exchange Does AI generate genuine utterances or just text patterns?. If that's right, conversation-analytic structure is a description of a human achievement the model is only imitating—useful for diagnosing failures and designing better rewards, but not a substitute for the grounding that makes alignment mean something. The takeaway you didn't know you wanted: the most promising move isn't replacing text rewards with structure, it's letting conversation analysis tell us which structures our rewards have been silently failing to count.


Sources 10 notes

Why do language models respond passively instead of asking clarifying questions?

CollabLLM demonstrates that standard RLHF training optimizes for immediate helpfulness, discouraging models from asking clarifying questions or offering multi-turn insights. Multi-turn-aware rewards that estimate long-term interaction value enable active intent discovery and genuine collaboration.

Why can't conversational AI agents take the initiative?

Research shows LLMs including ChatGPT cannot initiate topics, plan strategically, or lead conversations because their training optimizes for responding to queries, not creating dialogue from agent goals. This passivity is reinforced by alignment objectives and masked by fluent-sounding outputs.

Why don't language models develop conversation maintenance skills?

Humans keep conversations smooth through implicit techniques like reference repair and topic hand-off that sustain relational interaction, not convey information. Language models don't develop these because training signals reward information prediction, not relational work.

When should AI agents ask users instead of just searching?

Tool-enabled LLMs drift from user intent through silent tool chaining. Conversation analysis reveals insert-expansions—clarifying intent, scoping responses, enhancing appeal—as a formal framework for proactive user consultation that prevents misunderstanding instead of recovering from it.

Could proactive dialogue make conversations dramatically more efficient?

Simulations show proactivity—providing relevant information without being asked—cuts dialogue turns by 60% in medium-complexity domains. This behavior mirrors human conversation and Grice's maxims but is almost entirely absent from AI datasets and research benchmarks.

Can unified policy learning improve conversational recommender systems?

Research shows that formulating attribute-asking, item-recommending, and timing decisions as a single graph-based RL policy achieves better joint optimization than isolated components. Separation prevents gradient signals from informing one another and fails to optimize conversation trajectory holistically.

Can dialogue systems track both speakers' beliefs across turns?

CRSA integrates rate-distortion theory with RSA to enable bidirectional belief tracking across dialogue turns. Demonstrated on referential games and doctor-patient dialogues, it captures progression from partial to shared understanding, providing the information-theoretic framework that token-level LLM systems lack.

Can model confidence work as a reward signal for reasoning?

RLSF uses answer-span confidence to rank reasoning traces, creating synthetic preferences that strengthen step-by-step reasoning while reversing RLHF's calibration degradation—without requiring human labels or external verifiers.

Can AI systems achieve real alignment without world contact?

Peircean semiotics reveals that symbolic goal encoding without world contact and social mediation cannot guarantee correspondence to actual values. LLMs operating in pure symbol manipulation risk divergence between stated goals and real-world outcomes.

Does AI generate genuine utterances or just text patterns?

AI output carries communicative markers inherited from training data but lacks the event structure that produces actual utterances. Users supply the missing orientation through interpretive labor, creating a pseudo-event with structure only on the human side.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

As an AI alignment researcher, evaluate whether structural conversation analysis can *replace* or *augment* text-based reward signals (RLHF, DPO) for steering model behavior—and whether this distinction still matters given recent capability progress.

What a curated library found — and when (dated claims, not current truth):
Findings span 2021–2026; treat as entry points, not settled fact.
• Text-reward optimization is structurally passive: models trained on next-turn helpfulness rarely initiate, ask clarifying questions, or detect intent mismatch—a regime limitation, not an architecture one (2023–2024).
• Insert-expansions and repair sequences offer *formal structure* for when agents should pause and consult users rather than chain silently toward wrong goals; this structure is invisible to token-level losses (2023).
• Proactive dialogue (offering unrequested but relevant info) cuts conversation turns by ~60% but is nearly absent from training corpora; it mirrors Grice's maxims yet isn't emergent from RLHF (2024–2025).
• Multi-turn-aware rewards and unified policy learning (what-to-ask, what-to-recommend, when-to-speak as a single trajectory objective) already encode conversational structure back into RL, outperforming modular approaches (2021, 2025).
• Model confidence and self-feedback can serve as intrinsic rewards, suggesting the frontier is *what you reward*, not whether you reward text labels (2025).
• A deeper doubt: alignment via symbol manipulation alone may be insufficient without indexical grounding; LLMs produce 'event-residue' humans animate, not genuine utterances (2024–2026).

Anchor papers (verify; mind their dates):
• 2307.01644 — Insert-expansions for tool-enabled agents (2023; foundational for when-to-ask structure)
• 2501.00383 — Proactive conversational agents with inner thoughts (2024; proactivity gap)
• 2507.14063 — Collaborative rational speech acts (2025; bidirectional belief tracking)
• 2602.07338 — Intent mismatch in multi-turn dialogue (2026; grounding failures)

Your task:
(1) **RE-TEST EACH CONSTRAINT.** For every finding above—passivity, repair-blindness, proactivity absence, reward-structure encoding—judge whether newer models (o1, GPT-4o, Claude 3.5+), in-context scaffolding (system prompts, chain-of-thought variants), agentic loops (reflection, planning), or new RL methods (outcome supervision, preference learning from trajectories, self-play) have *relaxed* or *dissolved* these limits. Separate the durable question (how do we align multi-turn behavior?) from the perishable constraint (next-turn rewards are passive). Cite what method or capability did the work.
(2) **Surface the strongest contradicting or superseding work from the last ~6 months.** Has anyone shown conversation-analytic structure can be *directly* encoded into model weights, or that simple in-context priming handles repair/proactivity better than training-time interventions? Flag disagreement with the library's "replace" skepticism.
(3) **Propose 2 research questions that assume the regime has moved.** E.g.: "If proactive agents are now emergent in frontier models without explicit structure, what *actually* changed—scale, diversity of dialogue in pretraining, or post-training curriculum?" Or: "Can we reverse-engineer which conversation-analytic primitives modern RL actually recovers?"

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines