INQUIRING LINE

Reasoning, Retrieval, and Evaluation · Psychology, Society, and Alignment · Training, RL, and Test-Time Scalingcross-cluster

How do interpretive and evaluative disagreement show up differently in agent traces?

This explores the difference between two kinds of disagreement an agent's reasoning leaves behind — disputes about what the input *means* (interpretive) versus disputes about what the right *call* is once the facts are settled (evaluative) — and how each leaves a different fingerprint in the trace.

This explores the difference between interpretive disagreement (agents reading the same input to mean different things) and evaluative disagreement (agents agreeing on the facts but reaching different conclusions about what should happen), and how the two leave distinct marks in a reasoning trace. The corpus suggests they live at different points in the trace and call for opposite responses: interpretive disagreement appears early, at the moment of construal, while evaluative disagreement appears late, at the moment of judgment.

Interpretive disagreement is upstream. It's the finding that a single sentence is irreducibly multiple — different readers, occupying different social positions, legitimately construe the same words differently, and this shows up as a *distribution* of meanings rather than one right answer plus noise Why do readers interpret the same sentence so differently?. In a multi-agent trace, this surfaces as a fork at the input-understanding stage: agents diverge before they ever start reasoning toward a conclusion. That's exactly the signal a leader-follower debate protocol is built to expose — one agent proposes interpretations and others challenge them, so ambiguity gets detected and named instead of being silently resolved by whichever framing came first Can structured debate roles help small models detect ambiguity?.

Evaluative disagreement looks completely different. Here the agents *share* the factual reasoning and still land in different places — and that convergent-yet-divergent pattern is the tell. It marks genuinely contested normative territory, the kind of thing that should be escalated to a human rather than averaged away by a vote Can disagreement in reasoning traces signal legitimate value conflicts?. The crucial move is that majority voting destroys this signal: it treats a real value conflict as noise and papers over the exact thing a reader most needs to see. So interpretive disagreement says "we didn't read the question the same way"; evaluative disagreement says "we read it the same way and still don't agree on the answer."

Telling them apart in a trace is harder than it sounds, because traces are unreliable narrators. Reasoning text is largely stylistic surface — invalid traces routinely produce correct answers, so the prose is learned formatting, not a log of computation Do reasoning traces actually cause correct answers?. Worse, the discourse structure a trace *presents* often diverges from its actual internal causal pathway, so a disagreement that reads as interpretive on the page may not reflect what really drove the split Do reasoning traces actually show how models think?. This is why locating disagreement matters: the sentences that genuinely steer a trace are sparse planning and backtracking pivots — "thought anchors" — and a fork at an anchor is more diagnostic than divergence in the filler around it Which sentences actually steer a reasoning trace?.

The practical payoff is in how you handle each. Interpretive splits want process-level checking — verifying intermediate construal and policy compliance as the trace unfolds, which catches the early misreadings that final-answer scoring misses entirely Where do reasoning agents actually fail during long traces?. Evaluative splits want contestability — structuring the output as an attack/defense argument graph so a human can pinpoint and reject the specific premise or value at stake instead of arguing with an opaque verdict Can formal argumentation make AI decisions truly contestable?. The thing worth carrying away: a disagreeing trace isn't a failure to be voted into consensus — *where* and *when* the disagreement appears tells you whether you have a comprehension problem to fix or a value conflict to escalate.

Sources 8 notes

Why do readers interpret the same sentence so differently?

Interpretation Modeling research shows that disagreement on socially embedded sentences reflects valid differences in reader perspective, not annotation failure. Structured human disagreement in NLI benchmarks confirms that interpretation distributions carry meaningful information.

Can structured debate roles help small models detect ambiguity?

Mistral-7B achieved 76.7% accuracy in ambiguity detection through a protocol where a leader proposes interpretations and two followers challenge them with rotating roles. Role rotation and consensus forcing prevent persuasive framing failures and create stronger verification than pairwise debate.

Can disagreement in reasoning traces signal legitimate value conflicts?

When agents share factual reasoning but reach different conclusions, this convergent disagreement marks legitimately contested normative territory. Treating it as noise to suppress via consensus actively destroys the signal about what requires escalation rather than automation.

Do reasoning traces actually cause correct answers?

R1's intermediate tokens carry no special execution semantics and are generated identically to other LLM output. Invalid traces frequently produce correct answers, proving traces are not causally necessary—they correlate with answers via learned formatting, not functional reasoning.

Do reasoning traces actually show how models think?

ReasoningFlow found that most erroneous steps in traces don't influence final answers, and critically, the discourse structure traces present linguistically does not match their actual internal causal pathways. This gap suggests traces are narrative surface rather than verified computation logs.

Which sentences actually steer a reasoning trace?

Counterfactual resampling, attention analysis, and causal suppression all identify planning and backtracking sentences as thought anchors—sparse critical points that guide subsequent reasoning. These are functional pivots, not noise.

Where do reasoning agents actually fail during long traces?

Reliability for long-trace reasoning comes from checking intermediate states and policy compliance during generation, not from scoring final outputs. Adding intermediate verification raised task success from 32% to 87% because most failures are process violations, not wrong answers.

Can formal argumentation make AI decisions truly contestable?

Dung-style argumentation structures AI outputs as traversable attack/defense graphs, allowing users to identify and contest specific premises. Standard LLM outputs lack this structure, making it impossible to pinpoint which claims users actually reject.

How do interpretive and evaluative disagreement show up differently in agent traces?

Sources 8 notes

Next inquiring lines