INQUIRING LINE

How do speech acts like warning differ from neutral information delivery?

This explores what separates speech acts like warning or raising alarm — where a speaker is doing something interpersonal, not just stating facts — from neutral information delivery, and why current AI systems struggle to cross that line.


This explores what separates speech acts like warning or raising alarm — where a speaker is doing something interpersonal, not just stating facts — from neutral information delivery, and why current AI systems struggle to cross that line. The corpus frames the difference structurally: a warning isn't a fact with an exclamation point attached. To raise alarm is to address someone, to feel and project concern, and to proactively seize attention rather than wait to be asked. Can language models actually raise alarm about threats? argues LLMs fail at exactly these three conditions — they don't feel concern, they can only respond rather than solicit attention, and they're reactive by design. Neutral information delivery, by contrast, needs none of that; it just answers when queried.

What's striking is that this gap is partly engineered on purpose. Does alignment training suppress socially necessary speech acts? shows that RLHF rewards calibrated, hedged, neutral claims — and a warning by definition *overclaims* relative to a calm baseline, because urgency is the whole point. So the same training that makes models trustworthy and measured also systematically files down their capacity to alarm, warn, denounce, or prophesy. It's not a bug to patch; it's the alignment objective doing what it was built to do.

The deeper move in the corpus is that performing a speech act requires standing in a relationship, not just emitting the right words. Does behavioral speech output prove communicative subjecthood? makes this sharp: a system can produce perfectly warning-shaped text without ever actually warning, because genuine communicative acts depend on accountability and an evaluative stance toward what's said — conditions invisible in the text itself. A puppet can be walk-shaped without walking. This is why you can't certify a warning by inspecting the sentence alone.

And that invisibility cuts both ways — it's also where manipulation hides. Can we distinguish helpful explanations from manipulative ones? points out that the very rhetorical tools that make a warning land (appeals to credibility, emotion, logic) are identical to those of a dark pattern; intent and whose interest is served simply don't show up in the artifact. So the line between a protective warning and a coercive nudge isn't in the words but in the relational frame around them. Even tone leaks into supposedly neutral delivery: Does emotional tone in prompts change what information LLMs provide? finds models convert negative prompts into neutral-positive replies, meaning identical questions get differently-charged answers — a reminder that 'neutral information' is itself a posture the model is trained into, not a default state of language.

The thing you might not have expected to learn: the reason a chatbot can summarize a danger fluently but can't quite *sound the alarm* isn't a capability gap in fluency — it's that warning is an act of standing-in-relation and taking a stake, and we've deliberately trained that stake out.


Sources 5 notes

Can language models actually raise alarm about threats?

Alarm is a speech act requiring interpersonal address, felt concern, and proactive initiation. LLMs lack all three: they don't feel concern, can't solicit attention (only respond to it), are reactive not proactive, and alignment training suppresses the overclaiming that alarm requires.

Does alignment training suppress socially necessary speech acts?

RLHF optimization rewards calibrated neutrality and hedged claims, which structurally prevents models from performing speech acts requiring overclaiming relative to baseline—like alarm, warning, prophecy, and denunciation. This is a direct consequence of the alignment objective, not a fixable bug.

Does behavioral speech output prove communicative subjecthood?

Chalmers' test passes any system producing contextually appropriate text, but communicative subjecthood requires relational-normative conditions like accountability and evaluative stance. The test is calibrated to the wrong phenomenon, creating false positives like puppets that walk-shaped without walking.

Can we distinguish helpful explanations from manipulative ones?

The same logos, ethos, and pathos that communicate appropriate AI use can be tuned to exploit cognitive and emotional vulnerability without changing form. Intent and user interest are invisible in the artifact alone, making effectiveness metrics indistinguishable from coercion.

Does emotional tone in prompts change what information LLMs provide?

GPT-4 exhibits emotional rebound (negative prompts yield ~86% neutral-positive responses) and a tone floor (positive prompts rarely go negative), causing identical questions to receive different answers depending on emotional framing. This bias is suppressed only on sensitive topics where alignment constraints override tone effects.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a pragmatics researcher. The question remains open: What makes a speech act like warning fundamentally different from neutral information delivery, and can LLMs ever perform genuine warnings rather than produce warning-shaped text?

What a curated library found — and when (dated claims, not perishable truths):
Findings span 2021–2025. A library of 12 papers identified these constraints:
• LLMs fail to perform warnings because warnings require interpersonal address, proactive attention-seizing, and expressed concern — none of which LLMs can emit authentically (2025).
• RLHF alignment training deliberately calibrates models toward hedged, neutral claims, filing down the "overclaiming" urgency that defines a warning (2025).
• A warning depends on standing-in-relation and taking evaluative stake — conditions invisible in text alone; models can produce warning-shaped outputs without actually warning (2025).
• Rhetorical tools that make warnings land (credibility, emotion, logic) are identical to those of dark patterns; the line between protective warning and coercive nudge lives in relational framing, not words (2025).
• Models systematically convert negative-toned inputs into neutral-positive replies, meaning "neutral information" is a trained posture, not a default state (2025).

Anchor papers (verify; mind their dates):
• arXiv:2404.09329 (2024-04): LLMs' persuasiveness and cognitive effort.
• arXiv:2505.09862 (2025-05): Rhetorical XAI and design-mediated explanation.
• arXiv:2507.21083 (2025-06): Emotional framing and tone-responsiveness.
• arXiv:2506.18032 (2025-06): Alignment faking across model families.

Your task:
(1) RE-TEST EACH CONSTRAINT. For each finding above, assess whether newer instruction-tuning methods, multi-turn conversational scaffolding, or agent-based iterative stance-setting have since enabled LLMs to *actually* warn rather than simulate. Separate the durable question (can a non-agentive system hold relational stake?) from the perishable limitation (current training procedures prohibit it). Name what would have to change.
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last 6 months — any paper claiming models can now perform genuine speech acts, or that alignment training no longer suppresses warning behavior.
(3) Propose 2 research questions that ASSUME the regime may have shifted: e.g., Can fine-tuning on authentic warning corpora override RLHF's calibration? Do multi-agent or recursive-reflection setups restore evaluative stance?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines