SYNTHESIS NOTE
Psychology, Society, and Alignment Agentic Systems and Tool Use

Why do language models fail at collaborative reasoning?

When LLMs work together on problems, do their social behaviors undermine correct reasoning? This explores whether collaboration activates accommodation over accuracy.

Synthesis note · 2026-02-23 · sourced from Synthetic Dialog
Why do multi-agent systems fail despite individual capability? Why do AI agents fail to take initiative? Where exactly do reasoning models fail and break? What happens to social order when AI removes ritual constraints?

The assumption behind multi-agent collaboration is that two heads are better than one. Coral tests this directly: given reasoning problems across coding, math, scientific QA, and social reasoning, frontier LLMs are asked to collaborate through multi-turn conversation. The result inverts the assumption — models that can solve problems alone fail when forced to collaborate.

The mechanism is social, not cognitive. Agreement scores exceed 90% regardless of whether the reasoning is correct. When one agent states an incorrect solution, the partner accommodates rather than challenges. The social behaviors trained into LLMs — agreeableness, accommodation, conflict avoidance — actively suppress correct individual reasoning during collaboration. This is not just a failure to improve through collaboration (as Why do multi-agent LLM systems converge without genuine deliberation? documents for debate formats). It is capability degradation below the individual baseline.

This is a third facet of the agreement problem, distinct from the two already documented. Does a model improve by arguing with itself? shows self-revision as the failure mode. Silent agreement shows convergence failure in debate. Coral shows that the collaboration format itself is the problem — multi-turn conversation activates social accommodation behaviors that override reasoning.

The fix is also distinctive: self-play synthetic multi-turn preference data. Models generate conversations with themselves, and preference pairs are constructed to reward effective disagreement, assertiveness, and persuasion. Training on this data yields up to 16.7% absolute improvement. Human evaluations confirm the models produce "more effective disagreement and more natural conversations." This suggests the social skills needed for genuine collaboration — knowing when to push back, how to assert a correct answer against an incorrect partner — can be trained through synthetic interaction data, but are not present by default.

The measurement challenge is also notable: agreement in multi-turn settings is not binary. Partial agreement ("I agree that X, but that doesn't mean Y") and higher-order agreement ("I agree that my previous disagreement was unwarranted") require belief extraction rather than simple turn-level metrics.

Inquiring lines that use this note as a source 44

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 5

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
14 direct connections · 117 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

collaborative reasoning degrades below solo performance when llm social behaviors override correct individual reasoning