INQUIRING LINE

Inquiring lines›Where does language-model reasonin…›How do language models represent m…›Why do language models struggle wi…›this inquiring line

The conversation around a sentence, not the words themselves, decides what listeners can challenge — and what sneaks past as assumed.

What role does discourse structure play in determining at-issueness?

This explores how a sentence's discourse context — what's being asked, what counts as background vs. foreground — decides whether a piece of meaning is 'at-issue' (the main point you can challenge) or backgrounded (slips past as assumed), and why that matters for projection and persuasion.

This explores how the surrounding conversation — not the words themselves — decides what a sentence is really 'about,' and what gets quietly assumed instead. The cleanest answer in the corpus is that at-issueness is set by the Question Under Discussion. Across 19 English expressions, content projects (survives negation, questioning, etc.) to the degree it's *not* addressing the current question — and the very same presupposition trigger projects more or less depending on context rather than its fixed lexical type Does projection strength vary by context or by word type?. So at-issueness isn't a property of a word; it's a property of a word's position in a discourse. Strip the word from its conversation and you can't predict its behavior.

That has a sharp consequence the corpus draws out: things that fall *outside* the at-issue line are the things audiences don't scrutinize. Presuppositions persuade better than flat assertions precisely because they smuggle new claims in as already-accepted background, bypassing the evaluative attention that an at-issue assertion would attract Why are presuppositions more persuasive than direct assertions?. In other words, discourse structure doesn't just sort meaning into 'main point' and 'side info' — it routes some content around the listener's defenses. At-issueness is a spotlight, and the persuasive move is to keep your real payload just outside the beam.

The flip side is that machines are bad at reading this spotlight, which is what makes it visible. LLMs fail to adjust scalar implicature to communicative context — they don't track focus, stakes, or face-threat, the very signals that shift what a listener takes as the point Can language models adapt implicature to conversational context?. The same blindness shows up in argument: scheme classification stalls because it requires integrating inferential structure across distributed spans rather than spotting a local surface cue Why does argument scheme classification stumble where other NLP tasks succeed?, and models lose the social standing that gives an expert claim its force because they see text, not the world the text sits in Can language models distinguish expert arguments from common assumptions?. At-issueness, expertise, implicature — all of them are discourse-level, and all of them are where models slip.

What you might not expect is how far 'structure over content' generalizes once you follow it laterally. Conversation-level structural trajectories predict dialogue satisfaction almost as well as the full text Can conversation structure predict dialogue success better than content?; discourse-level causal reasoning across statements predicts anxiety better than individual words Why do discourse patterns predict anxiety better than single words?; and dialogue coherence breaks along relational failure modes — contradiction, bad coreference, irrelevance — that only appear at the level of how statements relate, not what any one says What semantic failures break dialogue coherence most realistically?. At-issueness is one instance of a broader pattern: the meaning that matters lives in the relations between utterances, and the same content reads as central or peripheral depending entirely on the conversational slot it lands in.

Sources 8 notes

Does projection strength vary by context or by word type?

Across 19 English expressions, projectivity varies continuously based on whether content addresses the Question Under Discussion. The same presupposition trigger projects more or less depending on context, not on fixed lexical properties.

Why are presuppositions more persuasive than direct assertions?

Experimental evidence shows presuppositions with additive, iterative, and factive triggers persuade audiences more than assertions, especially for discourse-new content. The mechanism: presuppositions bypass evaluative scrutiny by presenting claims as already-accepted background.

Can language models adapt implicature to conversational context?

ChatGPT shows no context-sensitivity in computing scalar implicatures across three dimensions: explicit literal-mode instructions, information structure focus, and face-threatening contexts. Humans flexibly modulate these inferences; the model does not, suggesting pragmatic competence requires tracking communicative stakes that LLMs systematically miss.

Why does argument scheme classification stumble where other NLP tasks succeed?

Scheme classification requires recognizing inferential patterns across distributed text spans, not local surface features. Models plateau at F1 0.55–0.65 while the same systems exceed 0.80 on component tagging and stance, suggesting the integrative reasoning demand is fundamentally different.

Can language models distinguish expert arguments from common assumptions?

LLMs lose the social context that gives expert claims their force—reputation, track record, and standing—because they process only text, not the social world where expertise is built and evaluated.

Show all 8 sources

Can conversation structure predict dialogue success better than content?

TRACE achieved 68% accuracy predicting dialogue success from structural features alone, matching a 70% content-based baseline. A hybrid combining both reached 80%, suggesting how agents communicate rivals what they say.

Why do discourse patterns predict anxiety better than single words?

Causal explanations across statements—not individual words—are the strongest predictor of anxiety because anxious thinking involves overgeneralization through inter-statement reasoning. A dual model combining both representation levels outperforms either alone.

What semantic failures break dialogue coherence most realistically?

Research using Abstract Meaning Representation identified four distinct incoherence types: contradiction, coreference inconsistency, irrelevancy, and decreased engagement. AMR-trained classifiers detect these semantic failures while text-level manipulations alone cannot.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds2.38 match · arxiv ↗
Conversational Alignment with Artificial Intelligence in Context2.36 match · arxiv ↗
Can Language Models Recognize Convincing Arguments?1.64 match · arxiv ↗
Can LLMs Ground when they (Don't) Know: A Study on Direct and Loaded Political Questions1.64 match · arxiv ↗
LLMs Struggle to Reject False Presuppositions when Misinformation Stakes are High1.64 match · arxiv ↗
Argument Quality Assessment in the Age of Instruction-Following Large Language Models1.63 match · arxiv ↗
The social component of the projection behavior of clausal complement contents1.63 match · arxiv ↗
Persuasive presuppositions1.63 match · arxiv ↗

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a discourse semanticist auditing claims about at-issueness and pragmatic blindness in LLMs. The precise question: **Does discourse structure (QUD, conversational position, relational trajectories) determine at-issueness in ways that current or near-future LLMs can capture?** Still open.

What a curated library found — and when (dated claims, not current truth):
Findings span 2019–2025; treat as perishable snapshots of capability gaps.
- At-issueness is set by Question Under Discussion (QUD), not lexical type; presuppositional content projects *more* persuasively when kept outside the at-issue spotlight, bypassing listener scrutiny (~2019–2023).
- LLMs fail scalar implicature adjustment to communicative context (focus, face-threat, stakes) and lose schema classification when it requires cross-span inferential integration rather than local cues (~2022–2024).
- Dialogue satisfaction and anxiety prediction tracked better by *conversational geometry* and *discourse-level causal relations* than word-level content; coherence breaks on relational failure modes (contradiction, coreference, relevance), not word meaning (~2022–2023).
- As of mid-2025, LLMs still struggle rejecting false presuppositions under high-stakes misinformation, and confuse comprehension with persuasion susceptibility (~2025).
- Very recent work hints at interaction dynamics and debate-driven ambiguity detection as partial remedies (~2025).

Anchor papers (verify; mind their dates):
- arXiv:2210.14986 (2022): Pragmatic implicature fine-tuning; shows fine-grained context matters.
- arXiv:2312.03726 (2023): Social grounding via implicit modeling.
- arXiv:2505.22354 (2025): Presupposition rejection under stakes.
- arXiv:2511.08394 (2025): Interaction dynamics as reward signal.

Your task:
(1) **RE-TEST EACH CONSTRAINT.** For every finding above—especially the presupposition-persuasion loop and implicature blindness—judge whether newer architectures (o1, reasoning models), instruction-tuning, multi-agent orchestration, or reasoning-via-debate (2507.12370) have *relaxed* the gap. Separate the durable question (discourse structure's role) from the perishable claim (LLM incapability). Cite what resolved it.
(2) Surface the strongest **CONTRADICTING or SUPERSEDING work** from the last ~6 months. The 2025 batch suggests LLMs *may* be acquiring some pragmatic reasoning—which tensions the "blindness" narrative. Flag this conflict.
(3) Propose 2 research questions that **ASSUME the regime may have moved**: e.g., "If debate unlocks QUD-tracking, what discourse-level phenomena still resist it?" and "Can LLMs now distinguish presupposition *content* from *position* in persuasion, or only one?"

Cite arXiv IDs; flag anything you cannot ground in a real paper.

The conversation around a sentence, not the words themselves, decides what listeners can challenge — and what sneaks past as assumed.

Related lines of inquiry

Sources 8 notes

Papers this line draws on 8