Does sequential structure within sessions complement cross-session preference channels?
This explores whether the order of what a user does inside a single session adds something distinct from longer-running preference signals built up across many sessions — and the corpus suggests they're complementary channels that capture different things, not redundant ones.
This explores whether the order of what a user does inside a single session adds something distinct from longer-running preference signals built up across many sessions. The clearest answer in the corpus comes from work on conversational recommenders, which argues that most current systems lean on only one channel — the active session — and lose signal that traditional recommenders proved valuable. The proposal is to model users through three channels at once: the current session, historical dialogues, and look-alike users, with all of them conditioned on the user's present intent Can conversational recommenders recover lost preference signals from history?. So the answer to the literal question is yes, but with a twist: within-session structure isn't just *added* to cross-session channels, it's the lens that tells you which of the longer-term signals are relevant right now.
That conditioning point is where it gets interesting. A separate line of work on personalization finds that abstract, summarized preference knowledge consistently beats replaying specific past interactions — and, tellingly, that recency-based recall beats similarity-based retrieval Does abstract preference knowledge outperform specific interaction recall?. Recency is a sequential signal: what you did most recently shapes what matters now. So the cross-session channel works best not as a flat archive of everything you've ever liked, but as a compressed summary that gets re-weighted by where you are in the current sequence.
The corpus also suggests the two channels are doing genuinely different jobs rather than two versions of the same job. One framing splits memory into episodic events (what happened, in order) versus semantic knowledge (stable facts about you), and keeps them in separate but linked structures — the architecture mirrors how humans bind moment-to-moment experience to durable knowledge of a person Can agents learn preferences by watching rather than asking?. Read against the personalization work, that's the same complementarity: sequence-aware episodic signal feeds and updates the slower semantic layer.
There's a useful complication worth knowing. A user isn't one preference vector that the latest action nudges — they can hold several distinct "personas," and which one is active depends on the candidate item in front of them Can attention mechanisms reveal which user taste explains each recommendation?. That reframes the whole question: sequential structure within a session isn't just refining a single long-term preference, it may be *selecting which long-term preference is live*. The within-session order acts as a switch over cross-session channels, not merely a refinement of them.
If you want to go one level deeper, two adjacent notes show how these channels get encoded in practice: simulators that separate session-level latents (a stable user profile) from turn-level latents (the intent that shifts as the conversation moves) Can controlled latent variables make LLM user simulators realistic?, and reward models conditioned on learned text summaries of preference rather than raw embeddings Can text summaries beat embeddings for personalized reward models?. Across all of it, the pattern is consistent: the durable cross-session signal supplies the *what*, and the in-session sequence supplies the *which, and which now*.
Sources 6 notes
Current CRS systems only use the active dialogue session to infer preferences, losing item-CF and user-CF signals proven valuable in traditional recommenders. Integrating current session, historical dialogues, and look-alike users—conditioned on current intent—recovers essential user representation structure.
PRIME framework shows semantic memory (preference summaries, parametric encodings) consistently beats episodic memory (retrieved past interactions) across models. Recency-based recall outperforms similarity-based retrieval, and task fine-tuning exceeds preference tuning methods.
M3-Agent demonstrates that separating episodic events from semantic knowledge in an entity-centric graph, combined with parallel memorization and control processes, allows agents to infer and act on user preferences without asking. This architecture mirrors human cognitive systems that bind disparate information about individuals across sensory modalities.
AMP-CF represents each user as multiple latent personas weighted dynamically by candidate item. This makes recommendations both diverse and interpretable—each suggestion traces to the specific persona preference it satisfies—without requiring post-hoc reranking.
RecLLM demonstrates that conditioning an LLM simulator on session-level (user profile) and turn-level (user intent) latent variables produces synthetic conversations measurable as realistic via crowdsource discrimination, discriminator models, and classifier-ensemble distribution matching.
PLUS trains summarizers and reward models jointly, learning that text-based preference summaries capture dimensions zero-shot summaries miss. These summaries transfer to GPT-4 for zero-shot personalization and remain interpretable to users.