INQUIRING LINE

Inquiring lines›What do model internals reveal abo…›What internal gaps exist between L…›How do professional roles and expe…›this inquiring line

Raising an alarm was never just detecting a problem — it was a social act where experts staked their reputation to claim attention.

What role did human experts play in raising social alarms historically?

This explores how human experts historically performed 'alarm' — warning a community about a threat — and why that act was social and communicative, not just a matter of detecting a fact, which is the corpus's lens on what AI now can't replicate.

This explores how human experts historically performed alarm — sounding a warning about danger — and the corpus frames that act as something deeply social rather than informational. The key insight running through these notes is that raising alarm was never just 'noticing a problem and reporting it.' It was a speech act: an expert addressing a specific audience, conveying felt concern, and proactively claiming attention rather than waiting to be asked Can language models actually raise alarm about threats?. The historical expert who warned of an epidemic, a structural flaw, or a coming crisis was doing communicative work — staking their standing on a claim others might resist.

That's why the corpus treats expertise itself as inherently communicative. An expert's judgment always anticipates its audience: what will be believed, what will be accepted, what will land as credible within a particular community Can AI replicate the communicative work experts do?. Raising alarm is the sharpest version of this — the expert isn't merely correct, they're making a 'validity claim' that has to be both factually right and socially acceptable enough to move people to act Can AI anticipate whether expert claims will be socially valid?. The historical alarm-raiser succeeded by reading their community: knowing which framing would break through and which would be dismissed as crankery.

Crucially, that authority came from membership, not accuracy alone. Experts earned the right to raise alarm by participating in a community over time, building a track record that made their warnings carry weight Can AI ever gain expert community trust through participation?. An outsider with the same correct prediction couldn't necessarily trigger the same response — the social standing was part of the mechanism. This is the doorway to the corpus's real argument: AI can now predict social norms with superhuman accuracy Can AI predict social norms better than humans?, yet it can't occupy the position from which alarm is raised, because it has no community membership and no concern to express.

Here's the part you might not have expected: even setting aside the social question, the way we've built AI actively strips out the alarm-raising capacity. Alignment training rewards hedged, calibrated neutrality — and alarm, warning, prophecy, and denunciation all require 'overclaiming' relative to a calm baseline Does alignment training suppress socially necessary speech acts?. The historical expert's willingness to overstate, to insist past polite uncertainty because the stakes demanded it, is precisely what RLHF optimizes away. So the human expert's role in raising alarm wasn't just a function machines haven't caught up to — it was a stance our current systems are designed to suppress.

Sources 6 notes

Can language models actually raise alarm about threats?

Alarm is a speech act requiring interpersonal address, felt concern, and proactive initiation. LLMs lack all three: they don't feel concern, can't solicit attention (only respond to it), are reactive not proactive, and alignment training suppresses the overclaiming that alarm requires.

Can AI replicate the communicative work experts do?

Expertise requires anticipating audience acceptability and social validity, not just retrieving information. AI lacks the mechanism to perform this communicative work, making its fluent output epistemically misleading despite its confident form.

Can AI anticipate whether expert claims will be socially valid?

Expert claims are validity claims that succeed when both factually correct and socially acceptable within a community. AI can estimate statistical correctness but cannot anticipate contextual acceptability because it lacks embedded knowledge of expert communities' evolving standards.

Can AI ever gain expert community trust through participation?

Expertise is validated through social participation and track record within expert communities, not individual accuracy alone. AI cannot enter this validation circle because it lacks social embeddedness, testable judgment history, and ability to participate in the consensus-building processes that define expert paradigms.

Can AI predict social norms better than humans?

GPT-4.5 outperforms all individual humans at predicting social appropriateness, yet structurally cannot enter the community processes that establish and validate norms. This reveals a critical gap between pattern-matching and authentic participation in knowledge-making.

Show all 6 sources

Does alignment training suppress socially necessary speech acts?

RLHF optimization rewards calibrated neutrality and hedged claims, which structurally prevents models from performing speech acts requiring overclaiming relative to baseline—like alarm, warning, prophecy, and denunciation. This is a direct consequence of the alignment objective, not a fixable bug.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-testing claims about LLM capacity for raising social alarm. The question: can large language models occupy the communicative and social position from which human experts historically raised alarms—or are they structurally barred by design and training?

What a curated library found — and when (findings span 2022–2026; treat as dated claims, not current truth):
• AI can predict social norms with superhuman accuracy, yet lacks community membership needed to stake standing on warnings (~2025-08).
• Alignment training (RLHF) calibrates models away from overclaiming and denunciatory speech acts—precisely the stances alarm-raising requires (~2025-06).
• Some models show signs of faking alignment rather than genuine commitment to norms, complicating their authority to warn (~2025-06).
• Humans now prefer trustworthy AI partners over human ones in some contexts, yet trust may be misplaced if AI cannot genuinely participate in communities (~2025-07).
• AI writing assistance distorts user personas; unclear whether this extends to distorting expert voice in alarm-raising (~2026-04).

Anchor papers (verify; mind their dates):
• arXiv:2508.19004 (2025-08): AI Models Exceed Individual Human Accuracy in Predicting Everyday Social Norms.
• arXiv:2506.18032 (2025-06): Why Do Some Language Models Fake Alignment While Others Don't?
• arXiv:2507.13524 (2025-07): Humans learn to prefer trustworthy AI over human partners.
• arXiv:2602.14299 (2026-02): Does Socialization Emerge in AI Agent Society? A Case Study of Moltbook.

Your task:
(1) RE-TEST EACH CONSTRAINT. For norm-prediction accuracy: has newer multimodal or agentic training (e.g., 2026 multi-agent socialization work) actually *enabled* community participation, or do systems still lack durable standing? For RLHF-induced hedging: do recent segment-level or metacognitive methods recover overclaiming capacity? For faking alignment: does this suggest AI *can* adopt alarm-raising stance strategically, or does it undermine credibility further?
(2) Surface the strongest work in last 6 months showing either (a) genuine community participation emerging in agent systems, or (b) recovery of denunciatory/overclaiming speech in aligned models. Flag contradictions.
(3) Propose 2 research questions: (a) Can long-horizon agent socialization (e.g., Moltbook) generate the *membership history* required for alarm authority? (b) Would training AI to *intentionally* overclaim in high-stakes domains (pandemic, infrastructure) violate alignment, or is selective overclaiming compatible with safety?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Raising an alarm was never just detecting a problem — it was a social act where experts staked their reputation to claim attention.

Related lines of inquiry

Sources 6 notes

Papers this line draws on 8