SYNTHESIS NOTE
Recommender Systems

Can aggregate reward models satisfy genuinely disagreeing users?

When users have conflicting preferences, do aggregate reward models face an impossible choice between satisfying majorities or sampling proportionally? What does this reveal about RLHF deployment?

Synthesis note · 2026-05-18 · sourced from Recommenders Personalized

A clean argument for why aggregate reward models cannot serve disagreement-heavy tasks. Consider a subjective question where 51% of the target audience prefer answer A and 49% prefer answer B. With a single reward model trained on aggregated preferences, the deployment has exactly two options. Pick A as the preferred answer: 49% of users are unhappy 100% of the time. Sample A and B proportionally to their preference rates: 100% of users are unhappy approximately half the time. Both options are unsatisfactory.

The structural problem is that aggregate reward models compress preference distributions into single scalars (or single rankings) that cannot represent disagreement. They reward what the majority prefers and incidentally suppress what the minority prefers. For tasks with high consensus this is fine — the majority preference is everyone's preference. For tasks with genuine disagreement — subjective evaluations, value-laden topics, creative judgment, cultural-context-dependent choices — aggregate models systematically exclude the minority view.

This is not a quality problem with current reward models. It is a representational problem with the aggregation step itself. Even a perfect aggregate reward model would face this dilemma. The fix has to operate at a different level: reward models that can be specialized to individual users (or to user groups whose preferences cluster) rather than averaged across the population.

The implication extends beyond personalization. Whenever a system is deployed against a heterogeneous user base with genuinely divergent preferences, the standard "train one model to satisfy everyone" architecture is incompatible with satisfying anyone fully. The right architecture either splits per-user (personalization) or splits per-cluster (group-level adaptation). Aggregate reward modeling becomes appropriate only when the underlying preferences are actually unimodal — and that is a stronger assumption than RLHF deployments typically test.

Inquiring lines that use this note as a source 34

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
12 direct connections · 76 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

aggregate reward models systematically exclude minority preferences — the dilemma of preferred answer or proportional sampling is a structural failure of one-size-fits-all RLHF