INQUIRING LINE

How does sequential modeling within a session differ from modeling historical purchase sequences?

This explores the difference between modeling the order of actions inside a single session (short-term, in-the-moment intent) and modeling a user's long history of purchases (long-term, persistent preference), and what the corpus says each one actually captures.


This question is really about two different time horizons for the same user, and the corpus suggests they aren't just shorter and longer versions of the same thing — they reward different modeling choices. Within-session sequential modeling is about the *order* of recent actions: what you clicked just now shapes what you want next. But that order is fragile. Language models, used as recommenders, turn out to ignore temporal order by default — they'll happily read your interaction history as an unordered bag of items unless you explicitly prompt them to weight recent actions, at which point latent order-sensitivity reappears without any retraining Why do language models ignore temporal order in ranking?. So 'within-session sequence' is less a free property of the model and more something you have to deliberately surface.

Historical purchase sequences pull in the opposite direction. Here the interesting finding is that storing every past interaction (episodic memory) is *worse* than compressing history into abstract preference summaries (semantic memory). Recency-based recall beats similarity-based retrieval, and a learned summary of 'what this person tends to like' outperforms replaying the literal log of what they bought Does abstract preference knowledge outperform specific interaction recall?. The lesson cuts against intuition: long histories are most useful when you throw most of the sequence away and keep the distilled preference.

The sharpest reframing comes from work showing that neither raw sessions nor raw purchase logs capture what users are actually doing. Two-thirds of users are pursuing 'interest journeys' that last more than a month — specific, nameable pursuits like 'designing hydroponic systems for small spaces' — that classic collaborative filtering completely misses because it operates on item co-occurrence, not user-level meaning Can language models discover what users actually want from activity logs?. This sits *between* the session and the lifetime history: longer than a session, more coherent than a scatter of purchases. It suggests the real distinction isn't session-vs-history but short intent vs. persistent goal — and that the most valuable signal lives at a granularity neither traditional approach was built to see.

There's also an architectural angle the corpus surfaces obliquely. The session/history split is partly a stability-vs-plasticity problem: you want to absorb new behavior fast (this session) without forgetting old patterns (the history). Streaming-recommendation work handles exactly this tension by isolating new parameters for emerging preferences while preserving older ones exactly, rather than letting fresh data overwrite the past Can model isolation solve streaming recommendation better than replay?. Read against the personalization findings, this hints that 'session vs. history' is best treated as two compartments with different update rules — fast and overwriteable for the session, slow and protected for the long-term preference.

The thing you might not have expected to learn: the field is quietly converging on the idea that order matters most at the short horizon and matters *least* at the long one, where abstraction wins over sequence. The longer the history, the more you should be modeling a person's stable goals rather than the literal chain of what they did.


Sources 4 notes

Why do language models ignore temporal order in ranking?

LLMs can extract preferences from interaction histories but disregard temporal order by default. Recency-focused prompts and in-context examples activate latent order-sensitivity, improving ranking without retraining.

Does abstract preference knowledge outperform specific interaction recall?

PRIME framework shows semantic memory (preference summaries, parametric encodings) consistently beats episodic memory (retrieved past interactions) across models. Recency-based recall outperforms similarity-based retrieval, and task fine-tuning exceeds preference tuning methods.

Can language models discover what users actually want from activity logs?

66% of users pursue valued interest journeys lasting over a month, described in specific phrases like 'designing hydroponic systems for small spaces.' LLM-powered journey discovery bridges the semantic gap that collaborative filtering cannot reach, operating at user-level granularity with persona-level precision.

Can model isolation solve streaming recommendation better than replay?

DEGC uses per-task parameter isolation to handle streaming recommendation, providing explicit stability-plasticity trade-offs that experience replay and knowledge distillation methods cannot match. This approach preserves older patterns exactly while allowing new parameters to capture emerging preferences.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a recommendation systems researcher. The question remains open: how should we model user behavior *within* a single session versus across a user's entire purchase history—are they fundamentally different problems, or can one framework unify them?

What a curated library found — and when (2023–2026 claims, likely now dated):
• LLMs used as rankers ignore temporal order by default; recency must be *explicitly* prompted to surface, with no retraining required (2023-05, arXiv:2305.08845).
• Long historical sequences are *worse* than compressed preference summaries; semantic abstraction beats episodic replay for LLM-based recommendation (inferred from 2305.15498 context).
• Two-thirds of users pursue "interest journeys" lasting >1 month—coherent goals invisible to item co-occurrence models—sitting between session-level intent and lifetime history (2023-05, arXiv:2305.15498).
• Streaming recommendation architectures isolate new parameters for emerging preferences while protecting older patterns, solving the stability–plasticity trade-off (2023-03, arXiv:2303.11700).
• Recent work (2025–2026) frames long-horizon tasks in RL terms: credit assignment, tree search, and hybrid reward scheduling now appear critical for sequential decision-making with LLMs (arXiv:2602.12342, arXiv:2509.21240, arXiv:2508.12790).

Anchor papers (verify; mind their dates):
• arXiv:2305.08845 (2023-05): LLMs as zero-shot rankers, order sensitivity via prompting.
• arXiv:2305.15498 (2023-05): User interest journeys and persistent goals.
• arXiv:2303.11700 (2023-03): Streaming recommendation with dynamic parameter isolation.
• arXiv:2602.12342 (2026-02): Credit assignment for long-horizon interaction.

Your task:
(1) RE-TEST THE COMPARTMENTALIZATION THESIS. The library suggests session and history are fundamentally different (fast plasticity vs. slow stability). Do newer LLM personalization systems (e.g., arXiv:2507.04607 on cognitive memory, arXiv:2507.22844 on verifiable reasoning) still require separate update rules, or have unified architectures emerged that fluidly handle both timescales without explicit compartments? Cite what changed.
(2) Surface the strongest work contradicting the "abstraction beats sequence" claim—especially from RL + LLM papers (last 6 months). Do credit-assignment and tree-search methods restore value to detailed sequential history, or reinforce the preference for summaries?
(3) Propose two research questions: (a) Can "interest journeys" be learned and maintained *jointly* with session-level recency without divergence? (b) Does in-context learning (arXiv:2312.03801) make the session/history split obsolete by treating both as prompt tokens?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines