SYNTHESIS NOTE

Why do the same users rate items differently each time?

User ratings are assumed to be clean preference signals, but do they actually fluctuate unpredictably? This matters because recommender systems rely on ratings as ground truth, yet temporal inconsistency and individual rating styles may contaminate that signal.

Synthesis note · 2026-05-03 · sourced from Recommenders General

The conventional reason recommender systems prefer explicit ratings (star ratings, thumbs up/down) over implicit feedback (clicks, watch time) is that explicit ratings are clean preference data. The user is directly stating "I like this." Amatriain, Pujol, and Oliver's experimental study evaluates this assumption and finds it doesn't hold.

The study has users rate the same items multiple times across spaced sessions. The same user gives substantially different ratings to the same item depending on when they rate. The variation is not just at the noise margin — users sometimes shift by multiple stars on the same item across sessions. The number of stars on a 5-star scale is not a stable property of the user's preference; it depends on mood, context, recently consumed alternatives, and rating-style at the moment.

The noise comes from multiple sources. Temporal inconsistency: the user's true preference may have shifted, but more often the rating itself fluctuates around a stable preference. Rater-specific style: some users use the full scale, some use only the top half, and these styles drift. Anchoring effects: a rating depends on what other items the user has recently rated.

The implication for recommender systems: rating data is preference data plus rating-noise plus rater-style, and conflating them produces biased models. Treating "5 stars" as a categorical labeling of "liked" understates the noise; treating the difference between 4 and 5 stars as meaningful overstates user precision. The paper undermines the cleanliness assumption that justified the field's preference for explicit ratings, which combined with the implicit-feedback availability and self-selection issues elsewhere in the literature, suggests the choice between explicit and implicit signals is more nuanced than the methodological canon admits.

Inquiring lines that read this note 19

This note is a source for these research framings, grouped by the broader line of inquiry each explores. Scan the bold lines of inquiry; follow any specific question forward.

How do social dynamics and selection effects compound in rating aggregates?

How can recommendation systems balance personalization with stability and coverage?

What makes specific clarifying questions more effective than generic ones?

Can graded relevance assumptions hold when user ratings are temporally inconsistent?

How can we distinguish genuine user preferences from measurement artifacts?

Related concepts in this collection 4

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

13 direct connections · 104 in 2-hop network ·medium cluster Open in graph ↗

Why do the same users rate items differently eac… Can implicit feedback reveal both preference and c… Do online reviews actually measure product quality… Why do online reviewers publish negative ratings d… Do online ratings actually reflect independent cus…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Can implicit feedback reveal both preference and confidence? When users take implicit actions like purchases or watches, do those signals carry two separable pieces of information: what they prefer and how certain we should be? Explicit ratings can't make that distinction.
extends: noisy explicit ratings make the case for implicit feedback's preference-plus-confidence structure stronger
Do online reviews actually measure product quality or just buyer preferences? Online reviews come only from customers who already expected to like a product. This self-selection might hide the true quality signal beneath layers of preference bias and writing motivation. What can aggregated ratings actually tell us?
complements: rating noise compounds with self-selection bias — both undermine the "ratings as ground truth" assumption
Why do online reviewers publish negative ratings despite positive experiences? When people post reviews publicly, do they adjust their honest opinions to seem more discerning? Schlosser's experiments test whether audience awareness shifts how people rate products compared to private ratings.
complements: rater-style and audience-effects together describe how the same private preference can produce wildly different public ratings
Do online ratings actually reflect independent customer opinions? How much do previously-posted ratings shape the ones that come after, and does this social influence distort what ratings supposedly measure? Understanding this matters for anyone relying on review aggregates to judge product quality.
extends: the noise here is one part within-user; social-dynamics adds a between-user noise component

Why do the same users rate items differently each time?

Inquiring lines that read this note 19

Related concepts in this collection 4

Related papers in this collection 8

Search by related questions 4