Can ethically aligned AI systems still communicate poorly?

Explores whether safety-aligned language models might fail at genuine conversation despite passing ethical benchmarks. This matters because pragmatic incompetence can erode trust and cause real harms in high-stakes domains.

Synthesis note · 2026-05-01 · sourced from Conversation Topics Dialog

Most discussion of LLM alignment focuses on the helpful-honest-harmless triad — preventing misinformation, toxic language, harmful recommendations. Kasirzadeh and Gabriel argue that this prioritization has overshadowed a different and equally fundamental issue: even an ethically aligned LLM may fail to engage in conversation in pragmatically appropriate ways. The two alignment problems are orthogonal. A model can be honest, helpful, and harmless and still systematically violate Gricean maxims, lose common ground across turns, fail to track questions under discussion, mishandle context-collapse, and produce pragmatically inappropriate utterances.

Their CONTEXT-ALIGN framework names ten desiderata that ethical alignment does not deliver: tracking context-sensitivity and indexicals, common-ground management, scoreboard updating, QUD and discourse-structure handling, accommodation of repairs, pragmatic inference, ethical-pragmatic integration, context-collapse mitigation, identification of defective contexts, transparency in context-handling, and cross-contextual memory. These are all dimensions where conversation depends on something architectural — a model of the interlocutor and the situation — that no amount of RLHF on outputs touches.

The implication is sharp. An LLM that passes every safety eval is not thereby a competent conversational partner. Misalignments in pragmatic understanding lead to breakdowns, misinformation, and erosion of trust — and the higher the stakes (healthcare, legal, emergency), the more dangerous these failures become. Conversational alignment is not a stylistic add-on to ethical alignment. It is a separate layer of competence that the field has barely begun to engineer for.

Inquiring lines that read this note 30

This note is a source for these research framings, grouped by the broader line of inquiry each explores. Scan the bold lines of inquiry; follow any specific question forward.

How can AI alignment serve diverse human preferences at scale?

Does alignment training create blind spots in detecting genuine safety threats?

How should human oversight be integrated with autonomous AI systems?

What assumptions about oversight fail when AI acts as rhetorical interlocutor?

What makes dialogue-based explanation more successful than monologue?

Why does linguistic alignment differ from genuine interpersonal coordination?

Why do LLM chatbots fail as independent therapeutic agents?

Why do self-improving systems struggle without clear external performance metrics?

Which AI safety problems lack the scalar metrics autoresearch requires?

Does conversational format create illusions of genuine AI communication?

Why can't AI participate in real communicative events?

How can language models sustain linguistic synchrony and intersubjectivity during dialogue?

Why should AI communication design follow human communication norms?

How do we evaluate AI systems when user perception misleads actual performance?

Why do people evaluate machines against human communication standards?

Can LLM personas constitute genuine psychology or remain linguistic role-play?

How does safety alignment degrade the quality of villain role-playing?

Can AI-generated outputs constitute genuine knowledge or valid claims?

How should conversational agents balance goal-driven initiative with user control?

Can safety training in chat scenarios transfer to agentic task performance?

Can AI systems develop genuine social understanding without embodiment?

What social norms do AI systems consistently fail to understand?

Why do agents confidently report success despite actually failing tasks?

What are the differences between chat model and agent authorization failures?

How can humans calibrate appropriate trust in AI systems?

Can developers detect and flag harmful validation in personal advice exchanges?

Do reasoning traces faithfully represent or merely mimic actual model reasoning?

Why is visible reasoning insufficient for monitoring AI safety?

Can ethically aligned AI systems still communicate poorly?

Inquiring lines that read this note 30

Related papers in this collection 8

Search by related questions 5