Consensus is Strategically Insufficient: Reasoning-Trace Disagreement as a Knowledge-Representation Signal
Multi-agent systems are commonly designed to reduce disagreement through voting, consensus protocols, debate, or fault-tolerant aggregation. We argue that this objective is insufficient for value-laden tasks, where disagreement may reflect genuine normative uncertainty rather than agent error. Building on prior work on reasoning-trace disagreement in human-AI collaborative moderation, we propose a knowledge-representation layer in which reasoning traces and agent decisions are abstracted into symbolic disagreement states. Given agents producing explicit reasoning traces and binary decisions, we distinguish four states according to reasoning similarity and conclusion agreement: convergent agreement, divergent agreement, convergent disagreement and divergent disagreement. These states support defeasible strategic routing rules. We instantiate the framework in content moderation and argue that disagreement-aware routing provides a bridge between sub-symbolic LLM deliberation and symbolic knowledge representation for multi-agent strategic reasoning.
Introduction. LLM-based multi-agent systems are increasingly used as collective reasoning architectures (Liang et al. 2024; Chen, Saha, and Bansal 2024) in which several agents deliberate, debate, or aggregate judgments before producing a final output (Kostka and Chudziak 2025; Sadowski and Chudziak 2025). Existing approaches typically treat inter-agent disagreement as a defect to be reduced through majority voting, additional debate rounds, or robust aggregation (Du et al. 2024; Chen, Saha, and Bansal 2024; Liang et al. 2024; Zheng et al. 2025; Zhang et al. 2025). This is plausible for instrumental tasks where disagreement signals noise or reasoning failure. It is far less appropriate for value-laden tasks, where disagreement may be a stable property of the decision problem itself. Content moderation is paradigmatic (Gajewska et al. 2026).
Discussion / Conclusion. From Consensus to Strategic Escalation The framework reframes the design goal of LLM-based multi-agent systems. A consensus-seeking system asks how agents can be made to agree; a disagreement-aware system asks what the structure of disagreement implies about the appropriate next action. Reasoning traces are central to this shift: a vote alone does not reveal whether agents disagree because they misread the case or weigh shared considerations differently. By comparing traces and decisions jointly, the controller distinguishes interpretive from evaluative disagreement in a manner reminiscent of argumentation frameworks where conclusions depend on the structure of supporting and attacking reasons (Dung 1995; Rahwan and Simari 2009; Amgoud and Prade 2009). The state CD(c) does the most strategic work. In factual tasks it would look like inconsistency, in normative tasks it more plausibly indicates that agents share a description of the case and differ in value prioritization. Collapsing such cases into a single automatic decision risks hiding legitimately contested situations. Escalation here is not a failure of automation but a rational meta-action under normative uncertainty.