Why do the same users rate items differently each time?
User ratings are assumed to be clean preference signals, but do they actually fluctuate unpredictably? This matters because recommender systems rely on ratings as ground truth, yet temporal inconsistency and individual rating styles may contaminate that signal.
The conventional reason recommender systems prefer explicit ratings (star ratings, thumbs up/down) over implicit feedback (clicks, watch time) is that explicit ratings are clean preference data. The user is directly stating "I like this." Amatriain, Pujol, and Oliver's experimental study evaluates this assumption and finds it doesn't hold.
The study has users rate the same items multiple times across spaced sessions. The same user gives substantially different ratings to the same item depending on when they rate. The variation is not just at the noise margin — users sometimes shift by multiple stars on the same item across sessions. The number of stars on a 5-star scale is not a stable property of the user's preference; it depends on mood, context, recently consumed alternatives, and rating-style at the moment.
The noise comes from multiple sources. Temporal inconsistency: the user's true preference may have shifted, but more often the rating itself fluctuates around a stable preference. Rater-specific style: some users use the full scale, some use only the top half, and these styles drift. Anchoring effects: a rating depends on what other items the user has recently rated.
The implication for recommender systems: rating data is preference data plus rating-noise plus rater-style, and conflating them produces biased models. Treating "5 stars" as a categorical labeling of "liked" understates the noise; treating the difference between 4 and 5 stars as meaningful overstates user precision. The paper undermines the cleanliness assumption that justified the field's preference for explicit ratings, which combined with the implicit-feedback availability and self-selection issues elsewhere in the literature, suggests the choice between explicit and implicit signals is more nuanced than the methodological canon admits.
Inquiring lines that use this note as a source 19
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- How does Netflix decide which rows appear and in what order on the homepage?
- Why do some Netflix rows cache results while others require fresh signals?
- Why do users naturally express recommendations critiques instead of positive preferences?
- Can graded relevance assumptions hold when user ratings are temporally inconsistent?
- Can recommender systems separate true preference from individual rating style bias?
- Why do explicit ratings fail to capture uncertainty in user preferences?
- How should unobserved items differ from items rated zero preference?
- Can recommender systems correct for ratings that have been socially shaped?
- Why is the Judging preference constant while other traits vary slightly?
- Why do online ratings fail to represent independent individual preferences?
- How do different audience segments rate the same product differently?
- Can platforms predict which recommender type will stabilize ratings?
- What feedback loops form between recommender choice and review data?
- How much do individual ratings influence future ratings in networks?
- How do per-user concept drift and per-period periodicity combine in time-varying preferences?
- How do rating anchors shift meaning within short temporal windows for individual users?
- Should recommenders discard old user data uniformly or selectively retain historical signals?
- Why do shared accounts create heterogeneous preference drift within single user profiles?
- Why do users trust some recommenders more than others?
Related concepts in this collection 4
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Can implicit feedback reveal both preference and confidence?
When users take implicit actions like purchases or watches, do those signals carry two separable pieces of information: what they prefer and how certain we should be? Explicit ratings can't make that distinction.
extends: noisy explicit ratings make the case for implicit feedback's preference-plus-confidence structure stronger
-
Do online reviews actually measure product quality or just buyer preferences?
Online reviews come only from customers who already expected to like a product. This self-selection might hide the true quality signal beneath layers of preference bias and writing motivation. What can aggregated ratings actually tell us?
complements: rating noise compounds with self-selection bias — both undermine the "ratings as ground truth" assumption
-
Why do online reviewers publish negative ratings despite positive experiences?
When people post reviews publicly, do they adjust their honest opinions to seem more discerning? Schlosser's experiments test whether audience awareness shifts how people rate products compared to private ratings.
complements: rater-style and audience-effects together describe how the same private preference can produce wildly different public ratings
-
Do online ratings actually reflect independent customer opinions?
How much do previously-posted ratings shape the ones that come after, and does this social influence distort what ratings supposedly measure? Understanding this matters for anyone relying on review aggregates to judge product quality.
extends: the noise here is one part within-user; social-dynamics adds a between-user noise component
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Collaborative Filtering with Temporal Dynamics
- HyperBandit: Contextual Bandit with Hypernetwork for Time-Varying User Preferences in Streaming Recommendation
- Augmenting Netflix Search with In-Session Adapted Recommendations
- On Information Distortions in Online Ratings
- Why Do People Rate? Theory and Evidence on Online Ratings
- Collaborative Filtering for Implicit Feedback Datasets
- The Netflix Recommender System: Algorithms, Business Value, and Innovation
- Calibrated Recommendations
Original note title
explicit user ratings are noisy — temporal inconsistency and rater idiosyncrasy contaminate the supposed ground truth