How should systems handle contradictory opinions in user reviews?
When customers disagree about a product or service, should dialogue systems present all perspectives or select one? Understanding how to aggregate and balance diverse opinions affects whether users trust the response.
Most task-oriented dialogue research focuses on factual knowledge — FAQs, product specifications, service guides. But in many TOD tasks, users care about subjective insights: the experiences, opinions, and preferences of other customers. "Is the WIFI reliable?" or "Does the restaurant have a good atmosphere?" require subjective knowledge that factual databases cannot provide.
SK-TOD (Subjective-Knowledge-based Task-Oriented Dialogue) formalizes this gap. The key challenge: even for the same aspect of a product or service, customers may have different opinions. A hotel's WIFI might have 70% positive and 30% negative reviews. The system's response should include BOTH perspectives along with their proportions — two-sided responses have been recognized as more credible and valuable for customers.
This introduces three new challenges beyond standard TOD:
- Knowledge source shift — from structured databases to unstructured user reviews
- Opinion aggregation — synthesizing diverse, sometimes contradictory viewpoints
- Balanced presentation — representing both sides proportionally rather than cherry-picking
Current TOD approaches trained on factual knowledge fail at this because they are designed to retrieve single correct answers, not to aggregate and balance multiple perspectives.
Multi-source enrichment as partial fix: M-OS (Multi-Source Opinion Summarization) demonstrates that enriching review-based opinion summaries with technical specifications and product descriptions produces 87% user preference over standard opinion-only summaries. The mechanism: factual enrichment enables precise product comparisons that review-only approaches lack, addressing decision fatigue and information overload. M-OS evaluates across 7 dimensions (fluency, coherence, relevance, faithfulness, aspect coverage, sentiment consistency, specificity) and achieves ρ=0.74 Spearman correlation with human judgment. The implication for SK-TOD: combining subjective review aggregation WITH factual specifications creates more useful and complete responses than either alone.
This connects to a broader theme in the vault. Since Can LLMs generate more novel ideas than human experts?, LLMs have difficulty with evaluative tasks in general. Aggregating subjective reviews requires exactly this evaluative stance — weighing perspectives, judging representativeness, and presenting a balanced view rather than a confident single answer.
Inquiring lines that use this note as a source 8
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Does the interface design itself shape how much content users will review?
- Does user preference for confirmation override model capability for disagreement?
- How do early reviewers shape what later buyers think a product is?
- Why do posters acknowledge multiple viewpoints without integrating them into coherent judgments?
- Can factual product data improve the credibility of subjective opinion summaries?
- Who decides which stakeholder perspective gets embedded in the pipeline?
- Can personalized systems reward honest disagreement instead of user confirmation?
- Why does showing counterarguments restore users' ability to discriminate?
Related concepts in this collection 2
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Can LLMs generate more novel ideas than human experts?
Research shows LLM-generated ideas score higher for novelty than expert-generated ones, yet LLMs avoid the evaluative reasoning that characterizes expert thinking. What explains this apparent contradiction?
subjective knowledge aggregation requires evaluative stance
-
Why do readers interpret the same sentence so differently?
How much of annotation disagreement in NLP reflects genuine interpretive multiplicity rather than error? This explores whether social position and moral framing systematically generate competing but equally valid readings.
subjective reviews embody irreducibly multiple interpretations of the same experience
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- “What do others think?”: Task-Oriented Conversational Modeling with Subjective Knowledge
- OpinionConv: Conversational Product Search with Grounded Opinions
- Evaluating Emotional Nuances In Dialogue Summarization
- Leveraging Few-Shot Data Augmentation and Waterfall Prompting for Response Generation
- RevCore: Review-augmented Conversational Recommendation
- Reranking-based Generation for Unbiased Perspective Summarization
- Multi-Agent-as-Judge: Aligning LLM-Agent-Based Automated Evaluation with Multi-Dimensional Human Evaluation
- ReConcile: Round-Table Conference Improves Reasoning via Consensus among Diverse LLMs
Original note title
task-oriented systems that incorporate subjective knowledge from user reviews need to aggregate diverse opinions including positive and negative perspectives for credibility