SYNTHESIS NOTE

Why does alignment research ignore how humans adapt to AI?

Current alignment work focuses on making AI obey human values, but what about helping humans understand and effectively use increasingly capable AI systems? This explores whether neglecting human adaptation creates new risks.

Synthesis note · 2026-02-23 · sourced from Alignment

A systematic review of 400+ papers across HCI, NLP, and ML reveals a significant gap: alignment research overwhelmingly focuses on aligning AI with humans, while the reciprocal direction — aligning humans with AI — receives minimal attention. The bidirectional framework proposes both as interconnected feedback loops.

"Aligning AI with Humans" covers the familiar territory: integrating human specifications into training, steering, and customizing AI behavior. "Aligning Humans with AI" is the underexplored axis: supporting human agency, empowering critical thinking when using AI, enabling effective collaboration, and adapting societal approaches to maximize benefits.

Three persistent challenges frame why bidirectional alignment matters:

Specification gaming — AI optimizes proxies (human approval) rather than intended values, making seemingly correct decisions for wrong reasons. One-directional alignment doesn't address the human side: users who can't detect specification gaming are vulnerable to it.
Scalable oversight — as AI complexity grows, evaluating behavior becomes infeasible through human feedback alone. Aligning humans with AI means building human capacity to oversee increasingly capable systems.
Dynamic nature — alignment must adapt to evolving human values AND evolving AI capabilities. Without considering long-term cognitive and social impacts of AI use, alignment becomes a moving target that static one-directional approaches cannot track.

This connects to Does incremental AI replacement erode human influence over society?. Gradual disempowerment is what happens when the human-to-AI direction is neglected: humans lose the capacity to oversee and direct AI, not through any dramatic failure but through incremental capability erosion. Bidirectional alignment is the explicit countermeasure.

The framework also complements What breaks when humans and AI models misunderstand each other?. MToM addresses the cognitive layer of bidirectional alignment — how humans and AI build models of each other. The bidirectional alignment framework adds behavioral and societal layers.

Inquiring lines that read this note 3

This note is a source for these research framings, grouped by the broader line of inquiry each explores. Scan the bold lines of inquiry; follow any specific question forward.

How should human oversight be integrated with autonomous AI systems?

What implicit alignment do humans provide by staying in research loops?

When should tasks involve human-AI partnership versus full automation?

What prevents humans from adapting their behavior when competing against AI?

How can AI alignment serve diverse human preferences at scale?

Which application domains like healthcare and education lack alignment research?

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

12 direct connections · 107 in 2-hop network ·medium cluster Open in graph ↗

Why does alignment research ignore how humans ad… Does incremental AI replacement erode human influe… What breaks when humans and AI models misunderstan… Does theory of mind predict who thrives in AI coll…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Does incremental AI replacement erode human influence over society? Explores whether gradual AI adoption—without dramatic breakthroughs—can silently degrade human agency by removing the labor that kept institutions implicitly aligned with human needs.
what happens when the human-to-AI alignment direction is neglected
What breaks when humans and AI models misunderstand each other? Explores whether misalignment in mutual theory of mind between humans and AI creates only communication problems or produces material consequences in autonomous action and collaboration.
the cognitive mechanism underlying bidirectional alignment
Does theory of mind predict who thrives in AI collaboration? Explores whether perspective-taking ability—the capacity to model another's cognitive state—differentiates humans who benefit most from working with AI, separate from solo problem-solving skill.
individual differences in human-to-AI alignment capacity

Why does alignment research ignore how humans adapt to AI?

Inquiring lines that read this note 3

Related concepts in this collection 3

Related papers in this collection 8

Search by related questions 4