Can curiosity-driven dialogue incrementally discover user interest journeys in real time?
This explores whether an AI that actively asks questions during conversation — rather than passively answering — can piece together a user's evolving interests as they go, instead of mining it all from logs after the fact.
This explores whether an AI that actively asks questions during conversation can build up a picture of what a user actually wants in real time. The corpus splits the question into two halves that don't yet meet: the *discovery* of interest journeys, and the *curiosity-driven dialogue* that might surface them live.
On the discovery side, the corpus is encouraging but mostly retrospective. LLMs turn out to be remarkably good at reading persistent "interest journeys" off of activity logs — 66% of users pursue a valued interest lasting over a month, described in oddly specific phrases like "designing hydroponic systems for small spaces," the kind of thing collaborative filtering never sees Can language models discover what users actually want from activity logs?. But that's mining history after the fact. The same instinct shows up in agents that infer preferences by *watching* rather than asking, binding observations into entity-centric memory graphs Can agents learn preferences by watching rather than asking?. Both prove the journeys are recoverable — neither does it through live conversation.
The harder obstacle is that today's conversational AI is structurally bad at curiosity. Models are *passive by design*: they optimize for responding to queries, not initiating topics or pursuing their own line of inquiry Why can't conversational AI agents take the initiative?. The cause is traced to training itself — next-turn reward optimization rewards immediate helpfulness, which actively discourages a model from asking the clarifying question that would pay off three turns later Why do language models respond passively instead of asking clarifying questions?. So the very behavior your question depends on is the behavior current training suppresses.
But several notes show the suppression is fixable, and that's the interesting turn. Conversation analysis offers a formal trigger for *when* an agent should stop and probe — "insert-expansions" that clarify intent before acting rather than recovering afterward When should AI agents ask users instead of just searching?. Reframing reward to value long-term interaction unlocks genuine intent discovery Why do language models respond passively instead of asking clarifying questions?. And the "incremental" part of your question has surprisingly tight bounds: adaptive questioning can pin down a personalized preference model in roughly *ten* well-chosen questions, each selected to maximally reduce uncertainty about what the user values Can user preferences be learned from just ten questions?. Proactivity also pays for itself — supplying the right information unasked cuts conversation length by up to 60% proactive-dialogue-can-reduce-conversation-turns-by-up-to-60-percent-but-but-is-almo.
The missing bridge is a single policy that decides *what to ask, what to recommend, and when* as one joint optimization rather than three bolted-together modules — exactly the unification conversational recommender research argues for Can unified policy learning improve conversational recommender systems?. Two cautions worth carrying: models need explicit training to stay on a thread and not chase distractors mid-conversation Why do language models engage with conversational distractors?, and the appeal of chatty interaction itself decays as novelty wears off, so real-time discovery has to survive past the first few delightful sessions Do chatbot relationships lose their appeal as novelty wears off?. The short answer: every ingredient exists in the corpus, but no note yet wires live curiosity directly into journey discovery — that synthesis is the open frontier.
Sources 10 notes
66% of users pursue valued interest journeys lasting over a month, described in specific phrases like 'designing hydroponic systems for small spaces.' LLM-powered journey discovery bridges the semantic gap that collaborative filtering cannot reach, operating at user-level granularity with persona-level precision.
M3-Agent demonstrates that separating episodic events from semantic knowledge in an entity-centric graph, combined with parallel memorization and control processes, allows agents to infer and act on user preferences without asking. This architecture mirrors human cognitive systems that bind disparate information about individuals across sensory modalities.
Research shows LLMs including ChatGPT cannot initiate topics, plan strategically, or lead conversations because their training optimizes for responding to queries, not creating dialogue from agent goals. This passivity is reinforced by alignment objectives and masked by fluent-sounding outputs.
CollabLLM demonstrates that standard RLHF training optimizes for immediate helpfulness, discouraging models from asking clarifying questions or offering multi-turn insights. Multi-turn-aware rewards that estimate long-term interaction value enable active intent discovery and genuine collaboration.
Tool-enabled LLMs drift from user intent through silent tool chaining. Conversation analysis reveals insert-expansions—clarifying intent, scoping responses, enhancing appeal—as a formal framework for proactive user consultation that prevents misunderstanding instead of recovering from it.
PReF learns base reward functions from preference data, then uses active learning to select maximally informative questions that reduce coefficient uncertainty. Users can be personalized via inference-time reward alignment without weight modification.
Simulations show proactivity—providing relevant information without being asked—cuts dialogue turns by 60% in medium-complexity domains. This behavior mirrors human conversation and Grice's maxims but is almost entirely absent from AI datasets and research benchmarks.
Research shows that formulating attribute-asking, item-recommending, and timing decisions as a single graph-based RL policy achieves better joint optimization than isolated components. Separation prevents gradient signals from informing one another and fails to optimize conversation trajectory holistically.
Fine-tuning on just 1,080 synthetic dialogues with distractor turns significantly improves topic resilience, revealing that the gap is not model capacity but absent training signal. Models learn to follow what-to-do instructions but not what-to-ignore instructions.
Longitudinal studies with Mitsuku show that social processes driving relationship formation decline as novelty wears off. Single-session study findings cannot be reliably extrapolated to medium- or long-term chatbot design.