Can users steer recommendations with natural language at inference?

Can recommendation systems let users specify their preferences in natural language at inference time without retraining? This matters because it would let new users and existing users dynamically adjust what they want to see.

Synthesis note · 2026-05-03 · sourced from Recommenders Personalized

Sequential recommenders predict a user's next interaction from history. Recent work uses LLMs to extract preferences from reviews and feed them as auxiliary supervision during training, but this approach can't be steered at inference: the user's preferences are baked into the model weights, so a new user requires fine-tuning to be served well.

Preference discerning is a different paradigm. Instead of training the model to embody preferences, it conditions the generative recommender on user preferences as text in the model's context window at inference time. An LLM extracts preferences from user reviews and item-specific data, producing a textual description of what the user wants. This text is fed into the sequential recommender as in-context conditioning, alongside the interaction history.

The architectural shift unlocks several capabilities. Users can specify in natural language what they want or want to avoid ("more action, less romantic"). New users without retraining can be served by computing their preferences from minimal data and injecting them into context. The system can be evaluated on preference-following capability, not just next-item prediction — Mender's benchmark covers preference-based recommendation, sentiment following, fine-grained steering, coarse-grained steering, and history consolidation. State-of-the-art sequential recommenders fail several of these axes because they don't have a mechanism to incorporate preferences they didn't train on; Mender succeeds because preferences are a runtime input, not a training target.

The general lesson: making something a context input rather than a parameter target trades efficiency (longer prompts) for flexibility (runtime steering). For tasks where users know better than the training set what they want, the trade is worth it.

Inquiring lines that read this note 5

This note is a source for these research framings, grouped by the broader line of inquiry each explores. Scan the bold lines of inquiry; follow any specific question forward.

How should conversational agents balance goal-driven initiative with user control?

Can users articulate what they want before AI helps them discover it?

How can recommendation systems balance personalization with stability and coverage?

How should dialogue recommender systems manage conversation history and state?

How much context length can sequential recommenders handle before steering degrades?

Related concepts in this collection 4

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

13 direct connections · 91 in 2-hop network ·medium cluster Open in graph ↗

Can users steer recommendations with natural lan… Can language models bridge the gap between critiqu… Can user preferences be learned from just ten ques… Can text summaries beat embeddings for personalize… Can conversational recommenders recover lost prefe…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Can language models bridge the gap between critique and preference? When users express what they dislike rather than what they want, can LLMs reliably transform those critiques into positive preferences that retrieval systems can actually use?
complements: both let users steer recommendations via natural language at inference; preference discerning starts from positive preferences while critiques start from negative ones
Can user preferences be learned from just ten questions? Explores whether adaptive question selection can efficiently infer user-specific reward coefficients without historical data or fine-tuning. This matters for scaling personalization without per-user model updates.
complements: PReF and Mender both achieve inference-time alignment without fine-tuning — PReF via reward factorization, Mender via NL conditioning
Can text summaries beat embeddings for personalized reward models? When training reward models on diverse user preferences, does conditioning on learned text-based summaries of user preferences outperform embedding vectors? This matters because better representations could make personalization more interpretable and portable.
extends: text-based preference conditioning beats embedding conditioning at the reward-model level too — same insight in alignment
Can conversational recommenders recover lost preference signals from history? Conversational recommenders abandoned item and user similarity signals when they shifted to dialogue-focused design. Can integrating historical sessions and look-alike users restore these channels without losing dialogue benefits?
complements: NL preferences from reviews are a fourth preference channel — text-distilled preferences abstract over individual interactions

Can users steer recommendations with natural language at inference?

Inquiring lines that read this note 5

Related concepts in this collection 4

Related papers in this collection 8

Search by related questions 4