SYNTHESIS NOTE
Recommender Systems

Why do the same users rate items differently each time?

User ratings are assumed to be clean preference signals, but do they actually fluctuate unpredictably? This matters because recommender systems rely on ratings as ground truth, yet temporal inconsistency and individual rating styles may contaminate that signal.

Synthesis note · 2026-05-03 · sourced from Recommenders General
What breaks when specialized AI models reach real users? How do people build trust with conversational AI?

The conventional reason recommender systems prefer explicit ratings (star ratings, thumbs up/down) over implicit feedback (clicks, watch time) is that explicit ratings are clean preference data. The user is directly stating "I like this." Amatriain, Pujol, and Oliver's experimental study evaluates this assumption and finds it doesn't hold.

The study has users rate the same items multiple times across spaced sessions. The same user gives substantially different ratings to the same item depending on when they rate. The variation is not just at the noise margin — users sometimes shift by multiple stars on the same item across sessions. The number of stars on a 5-star scale is not a stable property of the user's preference; it depends on mood, context, recently consumed alternatives, and rating-style at the moment.

The noise comes from multiple sources. Temporal inconsistency: the user's true preference may have shifted, but more often the rating itself fluctuates around a stable preference. Rater-specific style: some users use the full scale, some use only the top half, and these styles drift. Anchoring effects: a rating depends on what other items the user has recently rated.

The implication for recommender systems: rating data is preference data plus rating-noise plus rater-style, and conflating them produces biased models. Treating "5 stars" as a categorical labeling of "liked" understates the noise; treating the difference between 4 and 5 stars as meaningful overstates user precision. The paper undermines the cleanliness assumption that justified the field's preference for explicit ratings, which combined with the implicit-feedback availability and self-selection issues elsewhere in the literature, suggests the choice between explicit and implicit signals is more nuanced than the methodological canon admits.

Inquiring lines that use this note as a source 19

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 4

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
13 direct connections · 101 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

explicit user ratings are noisy — temporal inconsistency and rater idiosyncrasy contaminate the supposed ground truth