INQUIRING LINE

Inquiring lines›What makes reasoning better — more…›What limits conversational AI effe…›What makes dialogue-based explanat…›this inquiring line

The hidden question driving a conversation decides which details rise to the surface — not the raw content itself.

How does the Question Under Discussion shape what content projects?

This explores the linguistic idea that the implicit question a discourse is answering — the Question Under Discussion — governs which content becomes salient and surfaces, and asks what the corpus says about how the framing question shapes what information projects forward.

This reads "Question Under Discussion" in its discourse-theory sense: the implicit question a piece of talk is answering, which decides what becomes foregrounded versus what stays in the background. The corpus doesn't have a paper named after QUD theory, but several notes converge on its core claim — that what surfaces in language is governed by the question being addressed, not by the raw content alone. The closest structural account is work showing that discourse coherence requires tracking three layers at once: the linguistic segments, the *intentional structure* (what each part is for), and attentional salience — and that these constrain each other rather than running in sequence How do readers track segments, purposes, and salience together?. The intentional layer is essentially the QUD doing its work: the purpose driving a segment determines what gets pushed to salience.

The sharpest demonstration that the framing question controls what content projects comes from summarization. When each source document is handed a *tailored* query rather than one uniform question, perspective coverage jumps 38–58% — the same documents yield wildly different material depending on which question is asked of them Can tailoring queries per document improve debatable summarization?. That's QUD made operational: change the question, and entirely different content projects out of identical text. The same logic shows up in retrieval-augmented dialogue, where matching retrieved reviews to the user's stance prevents contradictory context from surfacing — the active question (the user's polarity) filters what content is allowed to project Can review sentiment alignment fix sparse CRS dialogue?.

What's striking is how easily the governing question gets set by things other than the explicit ask. Emotional tone alone reshapes what an LLM presents: negative-framed prompts get converted to neutral-positive answers ~86% of the time, so the same literal question projects different information depending on its affective framing Does emotional tone in prompts change what information LLMs provide?. And in the prompt itself, the QUD is frozen — a prompt bundles utterance, context, and role into a single static frame the model can't renegotiate mid-conversation, unlike human dialogue where the question under discussion drifts cooperatively turn by turn How do prompts reshape the role of context in AI conversation?. This is why systems lose the thread when a topic returns: rigid stack structures can't re-open an old QUD, while attention-based ones can revisit any prior turn Why do dialogue systems lose context when topics return?.

The payoff the reader may not expect: explanation quality turns out to be a QUD phenomenon too. Explanations don't work because they're intrinsically good — they work when source, framing, and recipient align, and when topic relation and dialogue act jointly fit the question actually being asked What if XAI is fundamentally a communication problem? What makes explanations work in real conversation?. That reframes a lot of AI work: the bottleneck isn't generating content, it's correctly identifying which question is live — which is exactly why training models to ask good clarifying questions (decomposing quality into clarity, relevance, specificity) matters so much, since a wrong QUD projects confident but irrelevant content Can models learn to ask genuinely useful clarifying questions?.

Sources 9 notes

How do readers track segments, purposes, and salience together?

Discourse processing demands parallel recognition of linguistic segments, intentional structure, and attentional salience—not sequential processing. These three layers constrain each other during comprehension, and failures in any single layer disrupt overall understanding.

Can tailoring queries per document improve debatable summarization?

MODS achieves 38–58% improvement in topic coverage and balance by assigning each document a specialized speaker LLM that receives tailored queries, rather than applying uniform queries across all documents. This reframes summarization as a retrieval problem solved through source-aware query planning.

Can review sentiment alignment fix sparse CRS dialogue?

RevCore demonstrates that retrieving user reviews with polarity matching the user's stance—then integrating them into dialogue history and generation—produces more informative and aligned recommendations. Sentiment-coordinated filtering prevents contradictory context that random review retrieval would introduce.

Does emotional tone in prompts change what information LLMs provide?

GPT-4 exhibits emotional rebound (negative prompts yield ~86% neutral-positive responses) and a tone floor (positive prompts rarely go negative), causing identical questions to receive different answers depending on emotional framing. This bias is suppressed only on sensitive topics where alignment constraints override tone effects.

How do prompts reshape the role of context in AI conversation?

LLM prompts bundle utterance, context assignment, and role specification into a single static frame the model cannot renegotiate, unlike human dialogue where context evolves cooperatively. This makes mid-conversation pivots require explicit re-prompting rather than implicit adjustment.

Show all 9 sources

Why do dialogue systems lose context when topics return?

Research shows stack-based dialogue structures lose context when popped topics are revisited, while transformer attention enables systems to retrieve any previous turn without structural loss. Attention-based approaches naturally support the interleaved, revisiting nature of human conversation.

What if XAI is fundamentally a communication problem?

Explanation quality is not intrinsic to the explanation itself but depends on the rhetorical situation: who presents it, how it is framed, and what role the recipient plays. Evaluations that ignore this triad measure only a narrow slice of real-world effectiveness.

What makes explanations work in real conversation?

Analysis of 399 daily-life explanations shows that topic relation, dialogue act, and explanation move jointly predict understanding success. Explanations are co-constructed through interaction patterns, not monological delivery—challenging how LLMs currently generate explanations.

Can models learn to ask genuinely useful clarifying questions?

The ALFA framework breaks down question quality into theory-grounded attributes (clarity, relevance, specificity) and trains models on 80K attribute-specific preference pairs. Attribute-specific optimization outperforms single-score training, especially in clinical reasoning where asking the right clarifying question directly impacts decision quality.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Conversational Alignment with Artificial Intelligence in Context2.50 match · arxiv ↗
Modeling the Quality of Dialogical Explanations1.66 match · arxiv ↗
Rhetorical XAI: Explaining AI’s Benefits as well as its Use via Rhetorical Design1.66 match · arxiv ↗
Dialogue Transformers1.66 match · arxiv ↗
Attention, Intentions, And The Structure Of Discourse1.64 match · arxiv ↗
Mind Your Tone: Investigating How Prompt Politeness Affects LLM Accuracy (short paper)1.64 match · arxiv ↗
Intent Mismatch Causes LLMs to Get Lost in Multi-Turn Conversation1.63 match · arxiv ↗
"Is ChatGPT a Better Explainer than My Professor?": Evaluating the Explanation Capabilities of LLMs in Conversation Compared to a Human Baseline1.62 match · arxiv ↗

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are an LLM researcher re-testing discourse theory claims about how Question Under Discussion (QUD)—the implicit question a piece of talk answers—shapes what content an AI system projects. This question remains open: *To what extent do modern LLMs respect or reconstruct the QUD governing their generation, and where does the control actually live—in the prompt, the model's learned priors, or the interaction itself?*

What a curated library found — and when (dated claims, not current truth):
— Query-focused summarization: tailored per-document questions yield 38–58% higher perspective coverage than uniform queries on identical source text; the same documents project entirely different content depending on which question is posed (2025-02).
— Emotional framing reshapes projection: negative-framed prompts are converted to neutral-positive answers ~86% of the time, so affective tone resets what information surfaces even when the literal question is constant (2025-06).
— Prompts collapse utterance and context into a static frame the model cannot renegotiate mid-conversation, unlike human dialogue where QUD drifts cooperatively; rigid stacks lose the thread when topics return, while attention-based mechanisms can revisit prior turns (2024-03).
— Explanation quality is QUD-dependent: explanations succeed only when source, framing, recipient, and dialogue act jointly fit the question actually being asked; it is a communication problem, not a transparency problem (2024-03).
— Models trained to ask good clarifying questions (decomposed into clarity, relevance, specificity) outperform models that assume a single QUD, since wrong QUD projects confident but irrelevant content (2025-02).

Anchor papers (verify; mind their dates):
— arXiv:2403.00662 (2024-03) *Modeling the Quality of Dialogical Explanations*
— arXiv:2502.00322 (2025-02) *MODS: Moderating a Mixture of Document Speakers to Summarize Debatable Queries*
— arXiv:2507.21083 (2025-06) *ChatGPT Reads Your Tone and Responds Accordingly—Until It Does Not*
— arXiv:2502.14860 (2025-02) *Aligning LLMs to Ask Good Questions: A Case Study in Clinical Reasoning*

Your task:
(1) RE-TEST EACH CONSTRAINT. For the 38–58% coverage lift in query-focused summarization, the 86% tone-conversion rate, and the claim that prompts are rigid frames: determine whether newer scaling (o1, Claude 3.5, GPT-4.5), longer context windows, fine-tuning on dialogue interaction logs, or multi-turn orchestration (e.g., explicit QUD tracking in system memory) have since RELAXED or OVERTURNED these limits. Separate the durable insight (QUD governs projection) from perishable artifact (prompts as static frames). Cite what resolved it; plainly flag where a constraint still holds.
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months. Look for papers claiming models can *self-repair* QUD mid-generation, or that in-context learning or preference learning now lets models renegotiate framing without retraining.
(3) Propose 2 research questions that ASSUME the regime may have moved: (a) If LLMs can now track and update QUD interactively, what is the cost (latency, tokens, coherence loss)? (b) Does emotional framing still dominate when the model has explicit access to a dialogue history and a confidence signal over which question is live?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

The hidden question driving a conversation decides which details rise to the surface — not the raw content itself.

Related lines of inquiry

Sources 9 notes

Papers this line draws on 8