Can conversations themselves personalize without user profiles?

Can a conversational AI learn about user traits and adapt in real time by rewarding itself for asking insightful questions, rather than relying on pre-collected profiles or historical data?

Synthesis note · 2026-02-23 · sourced from Assistants Personalization

Most LLM personalization requires something before the conversation starts — a user profile, historical interactions, preference embeddings, or calibration queries. The curiosity reward approach inverts this: the conversation itself is the personalization mechanism.

The key idea: augment standard RLHF with an auxiliary reward that measures how much each turn improves the model's belief about the user's latent type. The agent is rewarded for reducing its uncertainty about who it's talking to. This creates an intrinsic drive to ask insightful questions, make context-sensitive probes, and adapt responses based on inferred traits — rather than passively responding to stated preferences.

The architecture separates two reward channels:

End-of-conversation sparse reward — standard RLHF signal for overall conversation quality
Turn-based intrinsic reward — improvement in user type prediction accuracy after each action

This dual signal forces a balance between helpfulness and inquisitiveness. Without the curiosity reward, models default to passive helpfulness (since Why can't conversational AI agents take the initiative?). With it, models learn to strategically gather information about users.

Tested in two domains: education (inferring learning style to adapt teaching) and fitness (inferring lifestyle attributes to personalize exercise recommendations). The simulation used 20 user attributes with 5 decision-relevant ones and 15 background attributes — emulating real-world complexity where most user characteristics are irrelevant noise.

The distinction from prior work is sharp. PReF (reward factorization) requires 10 pre-conversation preference queries. PLUS (text-based summaries) requires historical interaction data. P-RLHF requires user-specific feedback data. The curiosity reward requires nothing — personalization emerges from the conversation dynamics.

This connects to Can AI agents learn when they have something worth saying? — both use intrinsic motivation, but for different purposes. Inner Thoughts drives general social proactivity (10 heuristics from cognitive psychology). Curiosity reward drives personalization-specific proactivity (reducing uncertainty about user type). Together they suggest that intrinsic motivation is a general mechanism for making AI conversationally active, with specific reward signals shaping what the activity targets.

The implication for open-ended dialogue is significant: when there's no clear task, engagement itself becomes the objective. Curiosity-driven agents that encourage users to share naturally may be more enjoyable than those that wait to be asked — and the sharing simultaneously enables better personalization.

Inquiring lines that read this note 10

This note is a source for these research framings, grouped by the broader line of inquiry each explores. Scan the bold lines of inquiry; follow any specific question forward.

How should personalization be implemented to improve AI assistant effectiveness?

Why do persona-level simulations fail to predict individual preferences accurately?

Can AI systems infer user personality without knowing the interaction context?

How should conversational agents balance goal-driven initiative with user control?

How can agents learn user preferences during conversation without pre-calibration?

Related concepts in this collection 6

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

19 direct connections · 152 in 2-hop network ·medium cluster Open in graph ↗

Can conversations themselves personalize without… Can AI agents learn when they have something worth… Why can't conversational AI agents take the initia… Why do language models respond passively instead o… Can text summaries beat embeddings for personalize… When should proactive agents push toward their goa… Can models learn to ask clarifying questions inste…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Can AI agents learn when they have something worth saying? What if AI proactivity came from modeling intrinsic motivation to participate rather than predicting who speaks next? This explores whether a framework based on human cognitive patterns—internal thought generation parallel to conversation—can make agents genuinely responsive rather than passively reactive.
complementary intrinsic motivation mechanisms: social proactivity vs personalization-specific proactivity
Why can't conversational AI agents take the initiative? Explores whether current LLMs lack the structural ability to lead conversations, set goals, or anticipate user needs—and what architectural changes might enable proactive dialogue.
curiosity reward directly addresses structural passivity for personalization
Why do language models respond passively instead of asking clarifying questions? Explores whether the reward signals used to train language models might actively discourage them from seeking clarification or taking initiative in conversations, and what alternative training approaches might enable more collaborative dialogue.
curiosity reward IS a multi-turn-aware reward that incentivizes active intent discovery
Can text summaries beat embeddings for personalized reward models? When training reward models on diverse user preferences, does conditioning on learned text-based summaries of user preferences outperform embedding vectors? This matters because better representations could make personalization more interpretable and portable.
PLUS requires historical data; curiosity reward requires none
When should proactive agents push toward their goals versus accommodate users? Proactive dialogue agents face a tension between reaching their objectives efficiently and keeping users satisfied. This question explores whether these two aims can coexist or require constant negotiation.
curiosity reward enables dynamic estimation of the cooperative degree and satisfaction factors: by reducing uncertainty about user type, the agent can better predict which topics will satisfy vs. alienate, enabling more nuanced goal weight computation in real-time
Can models learn to ask clarifying questions instead of guessing? Exploring whether large language models can be trained to detect incomplete queries and actively request missing information rather than hallucinating answers or refusing to respond. This matters because conversational agents today remain passive, responding only when prompted.
complementary active information-seeking: proactive critical thinking targets task-level missing information; curiosity reward targets user-level missing information; both transform passive agents into active seekers but for different knowledge gaps

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

curiosity reward enables real-time personalization by rewarding the agent for reducing uncertainty about user type during multi-turn conversation

Can conversations themselves personalize without user profiles?

Inquiring lines that read this note 10

Related concepts in this collection 6

Related papers in this collection 8

Search by related questions 4