Can structured debate roles help small models detect ambiguity?
Small language models struggle to recognize when problems are underspecified. Can assigning explicit leader-follower roles in multi-agent debates overcome this limitation and boost ambiguity detection accuracy?
Small models (7-9B parameters) individually struggle with ambiguity detection — recognizing when a problem statement is underspecified or admits multiple interpretations. But a structured multi-agent debate protocol with explicit leader-follower roles and rotation significantly boosts performance: Mistral-7B-led debates achieve 76.7% success rate, well beyond single-model baselines.
The protocol matters more than the models. A leader agent proposes an interpretation, two follower agents challenge or extend it, and roles rotate across rounds. The two-follower configuration creates a stronger consensus mechanism than pairwise debate because disagreement must survive two independent challenges rather than one. This is a different mechanism from the general multi-agent debate finding that When does debate actually improve reasoning accuracy? — ambiguity detection is not a verifiability problem but a recognition problem, and the structured role protocol prevents the persuasive-framing failure mode by forcing role rotation.
The result is notable because ambiguity detection is a prerequisite for the information-seeking behavior that models systematically lack. Since Can models identify what information they actually need?, the ability to detect ambiguity is upstream of the ability to ask clarifying questions. Leader-follower debate offers a multi-agent route to a capability that single models achieve at only 40-50% accuracy (QuestBench).
This also connects to the broader finding that Does cognitive diversity alone improve multi-agent ideation quality?. The leader-follower protocol imposes structural diversity through role assignment rather than relying on emergent diversity — a design choice that may explain why it works with small models that individually lack the expertise threshold.
Inquiring lines that use this note as a source 24
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- How do social correctives prevent premature consensus in human debate?
- Why does debate alone amplify errors in contested factual domains?
- What makes ambiguity recognition fundamentally important for poetry analysis?
- How does semantic ambiguity differ from structural ambiguity in language?
- Does structured debate between agent groups improve evaluation consensus more than independent scoring?
- Can structured dissent mechanisms replace genuine multi-model debate?
- Can single-model internal dialogue replace multi-agent debate systems?
- Can debate-style multi-agent systems be trusted on contested factual domains?
- Does role rotation prevent multi-agent debate from amplifying persuasive framing errors?
- Why does ambiguity detection require different multi-agent mechanisms than verifiable reasoning tasks?
- How does ambiguity detection connect to models' ability to ask clarifying questions?
- What structural changes enable agents to ask clarifying questions?
- What happens when AI discourse lacks a position to defend?
- How does the inability to manage ambiguity undermine literary analysis tasks?
- Can prompt engineering and external knowledge bases fix ambiguity recognition failures?
- What role does search capacity play in making debate more accurate?
- Does debate between agents actually improve reasoning on contested domains?
- Why does standard RAG succeed for evidence-based but fail for debate questions?
- How do comparison and debate questions differ in their aspect retrieval needs?
- Does adding multiple interpretations to ambiguous situations respect language more than resolving them?
- Can multi-agent debate prevent the confident convergence on wrong answers?
- How does multi-agent debate differ from single-model self-revision in fixing errors?
- Why do language models struggle with evaluative tasks like weighing competing viewpoints?
- Can multi-agent debate prevent reasoning models from amplifying errors?
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Beyond Single Models: Enhancing LLM Detection of Ambiguity in Requests through Debate
- We’re Afraid Language Models Aren’t Modeling Ambiguity
- Aligning Language Models to Explicitly Handle Ambiguity
- Abg-CoQA: Clarifying Ambiguity in Conversational Question Answering
- QuestBench: Can LLMs ask the right question to acquire information in reasoning tasks?
- Debating with More Persuasive LLMs Leads to More Truthful Answers
- ReConcile: Round-Table Conference Improves Reasoning via Consensus among Diverse LLMs
- Apollo's Oracle: Retrieval-Augmented Reasoning in Multi-Agent Debates
Original note title
leader-follower multi-agent debate enhances ambiguity detection in small models through structured role rotation and consensus forcing