Do LLM movie recommenders actually personalize to individual users?
While LLMs excel at explaining recommendations, do they truly adapt to each user's preferences and taste? A 160-user study tests whether personalized prompting techniques can close the personalization gap.
This online field experiment (160 active users) evaluates LLMs as conversational movie recommenders from the user's perspective. The mixed verdict: LLMs deliver strong recommendation explainability but fall short on overall personalization, diversity, and user trust. Two findings sharpen the design picture. Different personalized prompting techniques do not significantly affect user-perceived quality — but the number of movies a user has watched (i.e., the richness of context they can provide) plays a more significant role. And LLMs show a greater ability to recommend lesser-known or niche items. Qualitatively, providing personal context and examples is crucial to good recommendations.
The keeper is the gap between what LLM recommenders are good at (explaining, surfacing niche items) and what users actually need (personalization, diversity, trust) — and that the lever is user-provided context, not clever prompting. This is a user-study reality check on the LLM-recommender hype.
This sits in the vault's conversational-recommendation thread. It aligns with Do LLMs in conversational recommendation systems use collaborative or content knowledge? (content not collaborative signal — hence weak personalization) and the few-shot-doesn't-help finding echoes Does learning from mistakes improve in-context learning?'s broader point that more examples aren't automatically better.
Inquiring lines that use this note as a source 1
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
Related concepts in this collection 2
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Do LLMs in conversational recommendation systems use collaborative or content knowledge?
Conversational recommenders powered by LLMs might rely on either collaborative signals (user interaction patterns) or content/context knowledge (semantic understanding). Understanding which signal dominates would reveal how to design and deploy these systems effectively.
explains the weak personalization: content not collaborative signal
-
Does learning from mistakes improve in-context learning?
Explores whether inducing models to make errors on few-shot examples, then having them articulate principles from those mistakes, leads to better performance than learning from correct examples alone.
both find more/standard few-shot prompting isn't automatically better
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Large Language Models as Conversational Movie Recommenders: A User Study
- Understanding the Role of User Profile in the Personalization of Large Language Models
- CoLLM: Integrating Collaborative Embeddings into Large Language Models for Recommendation
- Exploring the Impact of Large Language Models on Recommender Systems: An Extensive Review
- Personalized Language Modeling from Personalized Human Feedback
- Large Language Models for User Interest Journeys
- User-LLM: Efficient LLM Contextualization with User Embeddings
- Learning Pluralistic User Preferences through Reinforcement Learning Fine-tuned Summaries
Original note title
LLM conversational recommenders offer strong explainability but lack personalization diversity and trust and few-shot prompts do not help