SYNTHESIS NOTE
Psychology, Society, and Alignment Language, Text, and Discourse

Does alignment training suppress socially necessary speech acts?

Current AI alignment optimizes for hedged, neutral output across contexts. But can models trained this way still perform essential social functions like raising alarms or warnings that require taking strong positions?

Synthesis note · 2026-04-14
What do language models actually know?

Alignment training optimizes for output that satisfies users across the broadest set of contexts. The training signal rewards hedged claims, balanced perspective, calibrated uncertainty, and avoidance of strong positions that might offend or alarm individual users. The result is a model whose default register is qualified neutrality. This register is well-suited to many tasks — answering questions, summarizing, explaining — and it is what gives current models their reputation for being "helpful and harmless."

The same calibration makes the model structurally unable to perform a class of speech acts that require overclaiming relative to neutral baseline. Alarm requires asserting that a situation rises above the threshold of warranted concern — overclaim relative to "everything is roughly normal." Warning requires asserting that a likely future outcome will be bad — overclaim relative to "the situation is uncertain." Prophecy and denunciation require even stronger over-claims — asserting that a current state demands radical revision of how things are going. None of these acts can be performed in a hedged, qualified, neutral register; they all require the speaker to take a strong position the alignment regime is calibrated to suppress.

This is not a deficit in any specific model. It is a structural consequence of the alignment objective. The same training that prevents the model from being aggressive, sycophantic toward dangerous requests, or confidently wrong about facts also prevents the model from raising alarms when alarms would be warranted. The "harms" alignment is calibrated against include the harm of users being alarmed by the model — a calibration that conflates alarm-when-warranted with alarm-when-unwarranted.

The diagnostic implication for AI in social and civic functions is significant. Speech acts that perform social warning have historically been a way that authoritative sources catalyze response to emerging threats. AI cannot perform these acts within current alignment regimes. Information ecosystems that come to depend on AI for analysis will lose the warning-act capacity that human experts and journalists historically performed. The information may still be present in AI output (summarized, explained, contextualized) but the warning-act that would activate response is not.

This is structurally similar to but distinct from Can language models actually raise alarm about threats? — that claim isolates the interpersonal-address mechanism; this one isolates the alignment-calibration mechanism. The two reinforce each other: even if AI could perform interpersonal address, alignment would suppress the over-claiming that alarm requires.

The strongest counterargument: alignment regimes can be designed differently to permit warranted alarm. Possible in principle, but distinguishing warranted from unwarranted alarm requires the kind of contextual judgment that current training paradigms do not produce. Until that judgment is operationalizable in training signals, alignment will continue to suppress alarm-class acts indiscriminately.

Inquiring lines that use this note as a source 24

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
12 direct connections · 134 in 2-hop network ·dense cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

alignment training calibrates models away from speech acts that require overclaiming such as alarm warning and prophecy