Do user outputs outperform inputs for LLM personalization?
Does a user's history of outputs (responses, endorsed content) matter more for personalization than their input queries? This explores what actually drives effective personalization in language models.
A study on user profile roles in LLM personalization surfaces a counterintuitive finding: the outputs users have produced or endorsed matter far more than the inputs they submitted. Using only the output part of user profiles achieves comparable or even superior performance to complete profiles across multiple LaMP tasks. Using only the input part leads to noticeable degradation.
This finding separates personalization from two adjacent paradigms:
Personalization ≠ RAG. Retrieval-augmented generation relies on semantic similarity between the input query and retrieved documents. Personalization works through a different mechanism — it is the style, preferences, and judgments expressed in historical responses that calibrate the model, not the semantic content of past queries.
Personalization ≠ ICL. In-context learning uses complete input-output pairs as demonstrations. Personalization requires only the output side — the response patterns that reveal who the user is and what they value.
The practical implication: when designing personalization systems under input length constraints, prioritize incorporating user-generated or user-approved responses over query histories. This unlocks the potential to include many more user profiles within limited context windows, because output-only profiles are both more effective and more compact than complete interaction histories.
A secondary finding adds a structural dimension: user profiles integrated closer to the beginning of the input context have more influence on personalization than those placed elsewhere. This parallels the positional bias documented in ICL — since How much does demo position alone affect in-context learning accuracy?, the spatial attention pattern appears to be domain-general, affecting personalization placement decisions as well as few-shot learning.
The output-over-input finding connects to the broader question of what personalization is. Since Can text summaries beat embeddings for personalized reward models?, the PLUS approach of training a summarizer to extract preference dimensions rather than topic summaries from user history is vindicated — preference dimensions are properties of outputs, not inputs.
Inquiring lines that use this note as a source 35
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Why does belief-specific tailoring work better than demographic personalization?
- Can aspect-augmentation help when user history is sparse or cold?
- How did Netflix's page generation algorithm evolve from rule-based to fully personalized?
- What makes historical user outputs more effective for personalization than semantic similarity?
- What latent dimensions matter most for content creators?
- Does personalization itself actually improve persuasion beyond post-training effects?
- What surface features do LLMs rely on when judging response quality?
- Why do one-shot studies fail to capture personalization effects?
- Which personalization techniques expose user data most directly?
- How should moderator LLMs decide which speakers to query per topic?
- What specific information must be exported from the language system?
- How much user interaction data is needed for effective AI personalization?
- How should aspect selection adapt across different item categories and users?
- Why do users experience LLMs as peers rather than statistical tools?
- How do you attribute copyright when billions of inputs shape one model?
- Why does profile position in context windows affect personalization strength?
- How does personalization differ mechanically from retrieval-augmented generation?
- Can preference dimensions extracted from outputs replace topic-based user summaries?
- How do input length constraints reshape personalization system design choices?
- How do personalization errors differ from general accuracy problems in summaries?
- How do different personalization levels affect persuasion system design and effectiveness?
- What production costs does personalization infrastructure impose on AI systems?
- Does semantic memory improve AI personalization more than episodic memory?
- Do similar user profiles create worse personalization errors than random ones?
- Why does personalization increase both trust and privacy concerns?
- Can abstract preference summaries substitute for specific user interaction history?
- When does combining episodic and semantic memory reduce personalization performance?
- What data types carry the most privacy risk in personalization systems?
- What preference data do different personalized alignment methods actually need?
- Why does semantic memory abstraction outperform raw episodic recall for personalization?
- Can personalized systems reward honest disagreement instead of user confirmation?
- Why do text-based user summaries outperform embedding vectors for pluralistic alignment?
- How much does preference data freshness matter compared to data source in DPO?
- How do you partition LLM experts by domain versus by time?
- Does temporal preference drift matter more than static user profiles for personalization?
Related concepts in this collection 3
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
How much does demo position alone affect in-context learning accuracy?
Moving demonstrations from prompt start to end without changing their content produces surprisingly large accuracy swings. Does spatial position in the prompt matter more than what demonstrations actually contain?
positional bias extends to user profile placement
-
Can text summaries beat embeddings for personalized reward models?
When training reward models on diverse user preferences, does conditioning on learned text-based summaries of user preferences outperform embedding vectors? This matters because better representations could make personalization more interpretable and portable.
PLUS focuses on preference dimensions (output properties) not topics (input properties)
-
How do personalization granularity levels trade precision against scalability?
LLM personalization operates at user, persona, and global levels, each with different tradeoffs. Understanding these tradeoffs helps determine when to invest in individual user data versus broader patterns.
output-only profiles enable more data within length constraints at all granularity levels
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Understanding the Role of User Profile in the Personalization of Large Language Models
- Personalization of Large Language Models: A Survey
- Personalized Language Modeling from Personalized Human Feedback
- PRIME: Large Language Model Personalization with Cognitive Memory and Thought Processes
- PersonaAgent: When Large Language Model Agents Meet Personalization at Test Time
- User-LLM: Efficient LLM Contextualization with User Embeddings
- Enhancing personalized multi-turn dialogue with curiosity reward
- Recommender AI Agent: Integrating Large Language Models for Interactive Recommendations
Original note title
historical user outputs drive personalization more effectively than input queries — personalization information not semantic information is the active ingredient