Can single sessions alone rival history-rich recommendation?

Can encoder-only transformers with clever masking capture enough sequential signal from a single anonymous session to match recommenders that use extensive user history? This explores whether smart architecture can overcome sparse data.

Synthesis note · 2026-06-03 · sourced from Recommenders Architectures

Session-based recommendation predicts the next item from a single, often anonymous session — no historical user profile to lean on. Sequential Masked Modeling (SMM) adapts encoder-only transformers (BERT/DeBERTa-style) to this regime with two pieces: sliding-window data augmentation (turning one session into many sub-sequences) and a penultimate-token masking strategy that better captures sequential dependencies than standard masking. Across Yoochoose, Diginetica, and Tmall, Transformer-SMM models consistently outperform single-session approaches and rival cross-session/multi-relation methods that have access to more extensive user history — despite using only single-session data.

The keeper is the masking design: where standard masked modeling hides random tokens, masking the penultimate token in augmented sequences directly targets next-item prediction, letting an encoder-only model extract strong sequential signal from minimal context — matching methods that need richer history.

This sits in the vault's recommender thread as a session-modeling architecture note. It shares the sequential structure matters lesson with Does conversation order matter for recommending items in dialogue?, and the do-more-with-less framing rhymes with the inductive-bias-over-capacity results elsewhere in the recommenders cluster.

Inquiring lines that read this note 3

This note is a source for these research framings, grouped by the broader line of inquiry each explores. Scan the bold lines of inquiry; follow any specific question forward.

How should dialogue recommender systems manage conversation history and state?

How much does sliding-window augmentation improve single-session modeling?

How can recommendation systems balance personalization with stability and coverage?

What sequential patterns emerge from anonymous single-session data?

Can graph structure and relationships fundamentally improve recommendation systems?

Can encoder-only architectures match decoder-based sequential models for recommendation?

Related concepts in this collection 2

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

13 direct connections · 58 in 2-hop network ·medium cluster Open in graph ↗

Can single sessions alone rival history-rich rec… Does conversation order matter for recommending it… Do LLM movie recommenders actually personalize to …

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Does conversation order matter for recommending items in dialogue? Conversational recommendation systems typically ignore the sequence in which items are mentioned, treating dialogue as a bag of entities. But does the order itself carry predictive signal about what to recommend next?
shared lesson that sequential structure carries signal bag-of-features models discard
Do LLM movie recommenders actually personalize to individual users? While LLMs excel at explaining recommendations, do they truly adapt to each user's preferences and taste? A 160-user study tests whether personalized prompting techniques can close the personalization gap.
adjacent recommender finding; SMM is the lightweight-architecture counterpoint to LLM-based recommendation

Can single sessions alone rival history-rich recommendation?

Inquiring lines that read this note 3

Related concepts in this collection 2

Related papers in this collection 8

Search by related questions 4