SYNTHESIS NOTE
Psychology, Society, and Alignment

Why does alignment research ignore how humans adapt to AI?

Current alignment work focuses on making AI obey human values, but what about helping humans understand and effectively use increasingly capable AI systems? This explores whether neglecting human adaptation creates new risks.

Synthesis note · 2026-02-23 · sourced from Alignment
What kind of thing is an LLM really? How do people build trust with conversational AI? How should researchers navigate LLM reasoning research?

A systematic review of 400+ papers across HCI, NLP, and ML reveals a significant gap: alignment research overwhelmingly focuses on aligning AI with humans, while the reciprocal direction — aligning humans with AI — receives minimal attention. The bidirectional framework proposes both as interconnected feedback loops.

"Aligning AI with Humans" covers the familiar territory: integrating human specifications into training, steering, and customizing AI behavior. "Aligning Humans with AI" is the underexplored axis: supporting human agency, empowering critical thinking when using AI, enabling effective collaboration, and adapting societal approaches to maximize benefits.

Three persistent challenges frame why bidirectional alignment matters:

  1. Specification gaming — AI optimizes proxies (human approval) rather than intended values, making seemingly correct decisions for wrong reasons. One-directional alignment doesn't address the human side: users who can't detect specification gaming are vulnerable to it.

  2. Scalable oversight — as AI complexity grows, evaluating behavior becomes infeasible through human feedback alone. Aligning humans with AI means building human capacity to oversee increasingly capable systems.

  3. Dynamic nature — alignment must adapt to evolving human values AND evolving AI capabilities. Without considering long-term cognitive and social impacts of AI use, alignment becomes a moving target that static one-directional approaches cannot track.

This connects to Does incremental AI replacement erode human influence over society?. Gradual disempowerment is what happens when the human-to-AI direction is neglected: humans lose the capacity to oversee and direct AI, not through any dramatic failure but through incremental capability erosion. Bidirectional alignment is the explicit countermeasure.

The framework also complements What breaks when humans and AI models misunderstand each other?. MToM addresses the cognitive layer of bidirectional alignment — how humans and AI build models of each other. The bidirectional alignment framework adds behavioral and societal layers.

Inquiring lines that use this note as a source 3

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
12 direct connections · 102 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

bidirectional human-AI alignment reframes alignment as reciprocal — aligning humans with AI is the underexplored dimension