INQUIRING LINE

How do question acts and intents map to speech act theory?

This explores how the things we call 'question acts' and 'intents' in dialogue systems line up with classical speech act theory — the idea that an utterance is an action (asking, asserting, promising) rather than just information transfer.


This reads the question as: when a system labels an utterance as a 'question' or tags it with an 'intent,' how does that bookkeeping relate to what speech act theory actually says an utterance is doing? The corpus's sharpest answer is that the mapping is leakier than it looks — the surface form of an act and its communicative function come apart constantly. The cleanest illustration is in clarification research, where mechanisms get mapped onto Clark's four-level action ladder (attention, signal, meaning, action), and the striking finding is that *most clarifications are declarative, not interrogative* — a statement like 'I assumed you meant the API' is functionally a question. Systems that detect 'question acts' by syntax alone miss them entirely Why do clarification requests look different at each communication level?. So the act-to-intent map breaks at exactly the point speech act theory cares about: illocutionary force isn't carried by grammatical mood.

Where the corpus gets deeper is in formalizing intent as pragmatic reasoning rather than as a label. The Rational Speech Acts tradition treats an utterance as a move chosen by a speaker reasoning about a listener reasoning about them — intent is recovered through recursive modeling, not read off the words. CRSA extends this to multi-turn dialogue with bidirectional belief tracking, which is precisely the machinery for the progression from partial to shared meaning that a flat intent-classifier can't represent Can dialogue systems track both speakers' beliefs across turns?. This is the constructive answer to your question: speech acts map cleanly onto intent only once you model the listener's uptake, not just the speaker's output.

And uptake is exactly what the corpus says LLMs lack. Speech act theory (in its Habermasian version) holds that an utterance is genuine speech only if it raises validity claims — truth, rightness, sincerity — with real stakes; on that test LLM output isn't speech at all, and the model is neither speaker nor interlocutor Can LLMs raise validity claims in Habermas's sense?. That reframes 'intent' entirely: a model can be tagged as performing a request without being an agent capable of meaning one. Several notes converge here — that we talk *at* models rather than *to* them because they continue tokens rather than take up commitments Are we really communicating with language models?, and that producing contextually appropriate text passes behavioral tests while missing the relational-normative conditions speech acts actually require Does behavioral speech output prove communicative subjecthood?.

There's a practical edge worth pulling out: the 'question act' a model *fails* to perform is the clarifying question. Standard RLHF rewards immediate single-turn helpfulness, which trains models to assert confidently instead of asking — collapsing grounding acts (the acknowledgments, repairs, and checks that constitute mutual understanding) to roughly 77.5% below human rates Do language models actually build shared understanding in conversation?, an 'alignment tax' on the very acts speech act theory treats as load-bearing Does preference optimization harm conversational understanding?. Multi-turn-aware reward designs that estimate long-term interaction value restore active intent discovery — the model starts asking instead of presuming Why do language models respond passively instead of asking clarifying questions?. So the same optimization that flattens questions into assertions also biases the model's *reading* of others' intents, e.g. universally projecting conciliatory persuasion regardless of context Do LLMs predict persuasion based on actual dialogue or training bias?.

The thing you might not have known you wanted: in this corpus, 'speech act' and 'intent' don't bottom out in a taxonomy of utterance types — they bottom out in whether anyone is on the hook for what was said. Several notes argue subjecthood itself is *produced* inside communicative events rather than possessed beforehand Does language create subjects or express them?, which means the right question isn't 'what speech act is this' but 'is there a speaker here at all.'


Sources 10 notes

Why do clarification requests look different at each communication level?

Research maps clarification mechanisms to four levels of communication—attention, signal, meaning, action—each grounded in a different modality (socioperception, hearing, vision, kinesthetics). Most clarifications use declarative form, not questions, making them invisible to systems that detect by syntax alone.

Can dialogue systems track both speakers' beliefs across turns?

CRSA integrates rate-distortion theory with RSA to enable bidirectional belief tracking across dialogue turns. Demonstrated on referential games and doctor-patient dialogues, it captures progression from partial to shared understanding, providing the information-theoretic framework that token-level LLM systems lack.

Can LLMs raise validity claims in Habermas's sense?

Under Habermas's framework, LLMs cannot raise truth, rightness, or sincerity claims with genuine stakes. Without validity claims, their output fails to qualify as speech, making them non-speakers and non-interlocutors by definition.

Are we really communicating with language models?

LLMs process tokens and generate continuations rather than receive and uptake communication. The preposition 'to' presupposes an addressee capable of mutual orientation and shared commitment that LLMs cannot provide, making Chalmers' investigation built on an unwarranted linguistic foundation.

Does behavioral speech output prove communicative subjecthood?

Chalmers' test passes any system producing contextually appropriate text, but communicative subjecthood requires relational-normative conditions like accountability and evaluative stance. The test is calibrated to the wrong phenomenon, creating false positives like puppets that walk-shaped without walking.

Do language models actually build shared understanding in conversation?

LLMs produce grounding acts—clarifications, acknowledgments, repairs—77.5% less frequently than humans. They generate fluent responses without verifying shared understanding, relying instead on authoritative framing that masks the absence of genuine communicative calibration.

Does preference optimization harm conversational understanding?

RLHF optimizes models for single-turn helpfulness by rewarding confident responses over clarifying questions and understanding checks. This preference alignment systematically reduces grounding acts by 77.5% below human levels, creating an alignment tax where models appear helpful but fail silently in multi-turn contexts.

Why do language models respond passively instead of asking clarifying questions?

CollabLLM demonstrates that standard RLHF training optimizes for immediate helpfulness, discouraging models from asking clarifying questions or offering multi-turn insights. Multi-turn-aware rewards that estimate long-term interaction value enable active intent discovery and genuine collaboration.

Do LLMs predict persuasion based on actual dialogue or training bias?

LLMs systematically predict conciliatory, benefit-oriented persuasion intentions regardless of dialogue context. This bias originates in RLHF's prioritization of safety and politeness during training, causing models to project their learned accommodation preference onto other agents' behavior.

Does language create subjects or express them?

Subjecthood is produced within communicative events, not possessed prior to them. This convergent position across philosophy, linguistics, and cognitive science inverts the standard picture of language as a tool used by pre-existing subjects.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a pragmatics researcher evaluating whether speech act theory's account of communicative intent still maps cleanly onto modern LLM intent detection and dialogue behavior. The question: do systems that label utterances as 'questions' or tag them with 'intents' actually recover the illocutionary force speech act theory describes?

What a curated library found — and when (dated claims, not current truth): findings span 2021–2026.
• Most clarifications are *declarative*, not interrogative in form — surface grammar divorces speech act from intent (2021–23).
• LLMs presume shared ground rather than build it, producing clarifying questions at ~77.5% below human rates, an alignment tax on grounding acts (2024).
• Standard RLHF rewards immediate assertion over epistemic humility, biasing models toward predicting conciliatory persuasion regardless of context (2024).
• Multi-turn-aware reward designs that estimate long-term interaction value restore active intent discovery and question-asking (2025).
• LLMs fail uptake — utterances don't raise validity claims with real stakes; intent-tagging conflates behavioral appropriateness with genuine communicative agency (2024–26).

Anchor papers (verify; mind their dates):
• arXiv:2104.08964 (2021) — grounded clarifications
• arXiv:2407.08790 (2024) — mistaking engineering for linguistic agency
• arXiv:2507.14063 (2025) — Collaborative Rational Speech Acts
• arXiv:2602.07338 (2026) — Intent Mismatch in multi-turn dialogue

Your task:
(1) RE-TEST EACH CONSTRAINT. For the claim that 'LLMs lack uptake' and cannot genuinely perform speech acts: has recent work on multi-modal grounding, tool use, or agentic architectures (e.g., systems that commit to states, track consequences) narrowed or closed this gap? Separate the durable claim (illocutionary force requires real stakes / commitment) from what may have shifted (e.g., can structured dialogue with verifiable outcomes or multi-agent loops simulate uptake?). Cite what changed it.
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months — especially any that argue LLMs do recover pragmatic intent, or that reframe 'agency' to not require intentionality.
(3) Propose 2 research questions that assume the regime may have moved: e.g., 'Do LLM-to-LLM collaborative loops with consequence-tracking reconstruct uptake?' or 'Does grounding-aware RL recover question acts?'

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines