Can conversations themselves personalize without user profiles?
Can a conversational AI learn about user traits and adapt in real time by rewarding itself for asking insightful questions, rather than relying on pre-collected profiles or historical data?
Most LLM personalization requires something before the conversation starts — a user profile, historical interactions, preference embeddings, or calibration queries. The curiosity reward approach inverts this: the conversation itself is the personalization mechanism.
The key idea: augment standard RLHF with an auxiliary reward that measures how much each turn improves the model's belief about the user's latent type. The agent is rewarded for reducing its uncertainty about who it's talking to. This creates an intrinsic drive to ask insightful questions, make context-sensitive probes, and adapt responses based on inferred traits — rather than passively responding to stated preferences.
The architecture separates two reward channels:
- End-of-conversation sparse reward — standard RLHF signal for overall conversation quality
- Turn-based intrinsic reward — improvement in user type prediction accuracy after each action
This dual signal forces a balance between helpfulness and inquisitiveness. Without the curiosity reward, models default to passive helpfulness (since Why can't conversational AI agents take the initiative?). With it, models learn to strategically gather information about users.
Tested in two domains: education (inferring learning style to adapt teaching) and fitness (inferring lifestyle attributes to personalize exercise recommendations). The simulation used 20 user attributes with 5 decision-relevant ones and 15 background attributes — emulating real-world complexity where most user characteristics are irrelevant noise.
The distinction from prior work is sharp. PReF (reward factorization) requires 10 pre-conversation preference queries. PLUS (text-based summaries) requires historical interaction data. P-RLHF requires user-specific feedback data. The curiosity reward requires nothing — personalization emerges from the conversation dynamics.
This connects to Can AI agents learn when they have something worth saying? — both use intrinsic motivation, but for different purposes. Inner Thoughts drives general social proactivity (10 heuristics from cognitive psychology). Curiosity reward drives personalization-specific proactivity (reducing uncertainty about user type). Together they suggest that intrinsic motivation is a general mechanism for making AI conversationally active, with specific reward signals shaping what the activity targets.
The implication for open-ended dialogue is significant: when there's no clear task, engagement itself becomes the objective. Curiosity-driven agents that encourage users to share naturally may be more enjoyable than those that wait to be asked — and the sharing simultaneously enables better personalization.
Inquiring lines that use this note as a source 9
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Which personalization techniques expose user data most directly?
- Does personalization help or hurt persistent companion chatbots?
- Can personalized questions improve conversation quality in open-domain chat?
- Can curiosity-driven personalization work better than pre-conversation preference elicitation?
- How much user interaction data is needed for effective AI personalization?
- Can personalization delay or prevent novelty decay in chatbot relationships?
- Can AI systems infer user personality without knowing the interaction context?
- What production costs does personalization infrastructure impose on AI systems?
- How can agents learn user preferences during conversation without pre-calibration?
Related concepts in this collection 6
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Can AI agents learn when they have something worth saying?
What if AI proactivity came from modeling intrinsic motivation to participate rather than predicting who speaks next? This explores whether a framework based on human cognitive patterns—internal thought generation parallel to conversation—can make agents genuinely responsive rather than passively reactive.
complementary intrinsic motivation mechanisms: social proactivity vs personalization-specific proactivity
-
Why can't conversational AI agents take the initiative?
Explores whether current LLMs lack the structural ability to lead conversations, set goals, or anticipate user needs—and what architectural changes might enable proactive dialogue.
curiosity reward directly addresses structural passivity for personalization
-
Why do language models respond passively instead of asking clarifying questions?
Explores whether the reward signals used to train language models might actively discourage them from seeking clarification or taking initiative in conversations, and what alternative training approaches might enable more collaborative dialogue.
curiosity reward IS a multi-turn-aware reward that incentivizes active intent discovery
-
Can text summaries beat embeddings for personalized reward models?
When training reward models on diverse user preferences, does conditioning on learned text-based summaries of user preferences outperform embedding vectors? This matters because better representations could make personalization more interpretable and portable.
PLUS requires historical data; curiosity reward requires none
-
When should proactive agents push toward their goals versus accommodate users?
Proactive dialogue agents face a tension between reaching their objectives efficiently and keeping users satisfied. This question explores whether these two aims can coexist or require constant negotiation.
curiosity reward enables dynamic estimation of the cooperative degree and satisfaction factors: by reducing uncertainty about user type, the agent can better predict which topics will satisfy vs. alienate, enabling more nuanced goal weight computation in real-time
-
Can models learn to ask clarifying questions instead of guessing?
Exploring whether large language models can be trained to detect incomplete queries and actively request missing information rather than hallucinating answers or refusing to respond. This matters because conversational agents today remain passive, responding only when prompted.
complementary active information-seeking: proactive critical thinking targets task-level missing information; curiosity reward targets user-level missing information; both transform passive agents into active seekers but for different knowledge gaps
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Enhancing personalized multi-turn dialogue with curiosity reward
- Learning Pluralistic User Preferences through Reinforcement Learning Fine-tuned Summaries
- From speaking like a person to being personal: The effects of personalized, regular interactions with conversational agents
- Collaborative Rational Speech Act: Pragmatic Reasoning for Multi-Turn Dialog
- Learning Retrieval Augmentation for Personalized Dialogue Generation
- PRIME: Large Language Model Personalization with Cognitive Memory and Thought Processes
- Active Listening: Personalized Question Generation in Open-Domain Social Conversation with User Model Based Prompting
- Doing Personal LAPS: LLM-Augmented Dialogue Construction for Personalized Multi-Session Conversational Search
Original note title
curiosity reward enables real-time personalization by rewarding the agent for reducing uncertainty about user type during multi-turn conversation