Why do clarification requests look different at each communication level?
Explores whether clarifications are unified speech acts or distinct mechanisms grounded in different modalities. Matters because dialogue systems treat clarifications uniformly, missing most of them.
"A Recipe for Annotating Grounded Clarifications" (Benotti & Blackburn 2021, arXiv:2104.08964) maps dialogue clarification mechanisms onto Clark's (1996) action ladder of communication, revealing that clarifications are not a uniform speech act but are grounded in distinct modalities at each level:
- Level 1 (attention): grounded in socioperception — achieving joint attention
- Level 2 (signal): grounded in hearing — recognizing what was said
- Level 3 (meaning): grounded in vision — recognizing referents in the world
- Level 4 (action): grounded in kinesthetics — moving and acting in the world
Causality flows upward through these levels — you must achieve attention to enable signal recognition, signal recognition to enable meaning recognition, meaning to enable action uptake. A clarification at any level is triggered when positive evidence of understanding at that level is absent.
Key implications:
Humans switch between clarifications grounded in different modalities seamlessly but systematically. The most common realization of clarification requests is declarative form, not interrogative — form is unreliable as an indicator of clarification function. This means current LLM dialogue systems that detect clarification needs via question detection miss most clarifications.
The paper's formal recipe — "a subsequent turn is a clarification grounded in modality m if it cannot be preceded by positive evidence of understanding in m" — provides a testable criterion that could inform dialogue system design.
This extends Why do language models skip the calibration step? by specifying that repair itself is multimodal and hierarchical, not a single mechanism. It also connects to Do language models actually build shared understanding in conversation? — LLMs cannot ground clarifications at levels 1, 3, or 4 because they lack the relevant modalities (socioperception, vision, kinesthetics), leaving only level 2 (text-as-signal) available.
Inquiring lines that use this note as a source 13
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Why do comprehensive posts without uncertainty tend to suppress conversation?
- What makes the prompt a fundamentally new kind of speech act?
- Why do dialogue systems fail to detect declarative clarification requests?
- How do humans decide which level of clarification to request?
- What makes some clarifying questions more useful than others?
- When should agents use clarification commands instead of assuming intent?
- How do question acts and intents map to speech act theory?
- Can conversation analysis predict when agents should ask users for clarification?
- What structural changes enable agents to ask clarifying questions?
- What is the difference between static and dynamic grounding in dialogue?
- Why do specific clarifying questions outperform generic requests for clarity?
- How might dual-process dialogue use information gain to trigger clarification?
- Which types of clarifying questions actually help users versus wasting their time?
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- A recipe for annotating grounded clarifications
- “Mama Always Had a Way of Explaining Things So I Could Understand”: A Dialogue Corpus for Learning to Construct Explanations
- Large Models of What? Mistaking Engineering Achievements for Human Linguistic Agency
- Grounding Gaps in Language Model Generations
- "Is ChatGPT a Better Explainer than My Professor?": Evaluating the Explanation Capabilities of LLMs in Conversation Compared to a Human Baseline
- Abg-CoQA: Clarifying Ambiguity in Conversational Question Answering
- Building and Evaluating Open-Domain Dialogue Corpora with Clarifying Questions
- Conversational Alignment with Artificial Intelligence in Context
Original note title
clarification mechanisms are grounded in distinct modalities that follow Clarks action ladder — socioperception hearing vision and kinesthetics