How should systems handle contradictory opinions in user reviews?

When customers disagree about a product or service, should dialogue systems present all perspectives or select one? Understanding how to aggregate and balance diverse opinions affects whether users trust the response.

Synthesis note · 2026-02-22 · sourced from Conversation Architecture Structure

Most task-oriented dialogue research focuses on factual knowledge — FAQs, product specifications, service guides. But in many TOD tasks, users care about subjective insights: the experiences, opinions, and preferences of other customers. "Is the WIFI reliable?" or "Does the restaurant have a good atmosphere?" require subjective knowledge that factual databases cannot provide.

SK-TOD (Subjective-Knowledge-based Task-Oriented Dialogue) formalizes this gap. The key challenge: even for the same aspect of a product or service, customers may have different opinions. A hotel's WIFI might have 70% positive and 30% negative reviews. The system's response should include BOTH perspectives along with their proportions — two-sided responses have been recognized as more credible and valuable for customers.

This introduces three new challenges beyond standard TOD:

Knowledge source shift — from structured databases to unstructured user reviews
Opinion aggregation — synthesizing diverse, sometimes contradictory viewpoints
Balanced presentation — representing both sides proportionally rather than cherry-picking

Current TOD approaches trained on factual knowledge fail at this because they are designed to retrieve single correct answers, not to aggregate and balance multiple perspectives.

Multi-source enrichment as partial fix: M-OS (Multi-Source Opinion Summarization) demonstrates that enriching review-based opinion summaries with technical specifications and product descriptions produces 87% user preference over standard opinion-only summaries. The mechanism: factual enrichment enables precise product comparisons that review-only approaches lack, addressing decision fatigue and information overload. M-OS evaluates across 7 dimensions (fluency, coherence, relevance, faithfulness, aspect coverage, sentiment consistency, specificity) and achieves ρ=0.74 Spearman correlation with human judgment. The implication for SK-TOD: combining subjective review aggregation WITH factual specifications creates more useful and complete responses than either alone.

This connects to a broader theme in the vault. Since Can LLMs generate more novel ideas than human experts?, LLMs have difficulty with evaluative tasks in general. Aggregating subjective reviews requires exactly this evaluative stance — weighing perspectives, judging representativeness, and presenting a balanced view rather than a confident single answer.

Inquiring lines that read this note 8

This note is a source for these research framings, grouped by the broader line of inquiry each explores. Scan the bold lines of inquiry; follow any specific question forward.

How do social dynamics and selection effects compound in rating aggregates?

Can model confidence signals reliably improve reasoning quality and calibration?

Does user preference for confirmation override model capability for disagreement?

Why should disagreement be treated as signal in collaborative reasoning?

Why do posters acknowledge multiple viewpoints without integrating them into coherent judgments?

Why do readers trust citations and complexity regardless of accuracy?

Can factual product data improve the credibility of subjective opinion summaries?

How do interface design choices shape consciousness attribution?

Who decides which stakeholder perspective gets embedded in the pipeline?

How do aggregate reward models systematically exclude minority user preferences?

Can personalized systems reward honest disagreement instead of user confirmation?

What makes AI persuasion effective and how can we counter it?

Why does showing counterarguments restore users' ability to discriminate?

Related concepts in this collection 2

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

14 direct connections · 172 in 2-hop network ·dense cluster Open in graph ↗

How should systems handle contradictory opinions… Can LLMs generate more novel ideas than human expe… Why do readers interpret the same sentence so diff…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Can LLMs generate more novel ideas than human experts? Research shows LLM-generated ideas score higher for novelty than expert-generated ones, yet LLMs avoid the evaluative reasoning that characterizes expert thinking. What explains this apparent contradiction?
subjective knowledge aggregation requires evaluative stance
Why do readers interpret the same sentence so differently? How much of annotation disagreement in NLP reflects genuine interpretive multiplicity rather than error? This explores whether social position and moral framing systematically generate competing but equally valid readings.
subjective reviews embody irreducibly multiple interpretations of the same experience

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

task-oriented systems that incorporate subjective knowledge from user reviews need to aggregate diverse opinions including positive and negative perspectives for credibility

How should systems handle contradictory opinions in user reviews?

Inquiring lines that read this note 8

Related concepts in this collection 2

Related papers in this collection 8

Search by related questions 4