SYNTHESIS NOTE

Where do recommendation biases come from in language models?

Do LLM-based recommenders inherit systematic biases from pretraining that differ fundamentally from traditional collaborative filtering systems? Understanding these sources matters for building fairer, more accurate recommendations.

Synthesis note · 2026-05-03 · sourced from Recommenders General

The Wu et al. survey identifies three biases that LLM-based recommendation systems exhibit but traditional recommenders don't. These biases are inherited from the underlying language model and propagate into recommendation behavior regardless of how the LLM is integrated.

Position bias: when item candidates are presented as a textual sequence in the prompt, the LLM systematically prefers items appearing earlier in the order, regardless of actual relevance. The bias comes from the language modeling objective — early tokens have stronger influence on what the model attends to. The same items in different orderings produce different recommendations.

Popularity bias: the LLM has seen popular items mentioned more frequently in pretraining corpora, so it tends to rank them higher in any recommendation list. This is more pervasive than CF popularity bias because it doesn't come from interaction data — it comes from the world's text. Items famous in news, social media, or product reviews get over-recommended whether they're actually relevant or not. Mitigation is hard because addressing the issue requires changing the pretraining corpus, which is upstream of the recommendation deployment.

Fairness bias: pretrained language models exhibit fairness issues related to sensitive attributes (gender, race, age) reflecting training data demographics. These biases pass through into recommendations, where the LLM might systematically recommend differently to users it perceives as belonging to certain demographic groups.

The implication is that LLM-based recommendation isn't just a more capable variant of conventional recommendation — it's a different beast with its own failure modes. Mitigating these biases isn't about adapting CF debiasing techniques; it requires LLM-specific approaches like balanced prompting, popularity-aware decoding, and fairness-conditioned generation. The research community is still working out the specifics.

Inquiring lines that read this note 29

This note is a source for these research framings, grouped by the broader line of inquiry each explores. Scan the bold lines of inquiry; follow any specific question forward.

How can LLM recommenders match or exceed collaborative filtering performance?

What structural factors drive popularity bias in recommendation systems?

How do language models inherit human biases from training data?

How should we design LLM systems to maintain alignment and control?

How do different LLM integration paradigms affect inheritance of pretraining biases?

Can prompting strategies overcome LLM biases without model fine-tuning?

Can prompt design strategies reduce position bias in language model recommendations?

Why can LLMs generate ideas better than they evaluate them?

Why do review corpora contain biases that affect generated comparisons?

How can recommendation systems balance personalization with stability and coverage?

Can recommender systems separate true preference from individual rating style bias?

How do evaluation biases undermine LLM quality assessment systems?

What other evaluation biases exist in LLM judge systems?

Does RLHF training sacrifice accuracy and grounding for user agreement?

What are the consequences of stacked accommodation biases in LLM predictions?

How does example difficulty affect learning efficiency in language models?

How does the pretraining distribution shape what LLMs find hard?

Related concepts in this collection 4

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

13 direct connections · 108 in 2-hop network ·medium cluster Open in graph ↗

Where do recommendation biases come from in lang… Where does LLM recommendation bias actually come f… Why do language models ignore temporal order in ra… How should language models integrate into recommen… Why do ranking systems need to model selection bia…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Where does LLM recommendation bias actually come from? Do conversational AI systems inherit popularity bias from their training data or from the datasets they're deployed on? Understanding the source matters for knowing how to fix it.
exemplifies: empirical instance of the popularity-bias prong — measurable at 5% vs 2% on ReDIAL
Why do language models ignore temporal order in ranking? When LLMs rank items based on interaction history, do they actually use sequence order or treat it as a set? Understanding this gap matters for building effective LLM-based recommenders.
extends: order-blindness is a fourth pretraining-inherited bias adjacent to position bias
How should language models integrate into recommender systems? When building recommendation systems with LLMs, should you use them as feature encoders, token generators, or direct recommenders? The choice affects efficiency, bias, and compatibility with existing pipelines.
complements: each integration paradigm inherits these biases differently — direct generation worst, input-augmentation least
Why do ranking systems need to model selection bias explicitly? Explores how training data from current rankers creates feedback loops that reinforce past decisions. Understanding this mechanism helps explain why naive approaches fail in production ranking systems.
complements: traditional recommender selection-bias and LLM-pretraining biases compose — feedback loops in production can amplify both

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

LLM-based recommendation faces three biases inherited from language model pretraining — position popularity and fairness

Where do recommendation biases come from in language models?

Inquiring lines that read this note 29

Related concepts in this collection 4

Related papers in this collection 8

Search by related questions 4