Can single sessions alone rival history-rich recommendation?
Can encoder-only transformers with clever masking capture enough sequential signal from a single anonymous session to match recommenders that use extensive user history? This explores whether smart architecture can overcome sparse data.
Session-based recommendation predicts the next item from a single, often anonymous session — no historical user profile to lean on. Sequential Masked Modeling (SMM) adapts encoder-only transformers (BERT/DeBERTa-style) to this regime with two pieces: sliding-window data augmentation (turning one session into many sub-sequences) and a penultimate-token masking strategy that better captures sequential dependencies than standard masking. Across Yoochoose, Diginetica, and Tmall, Transformer-SMM models consistently outperform single-session approaches and rival cross-session/multi-relation methods that have access to more extensive user history — despite using only single-session data.
The keeper is the masking design: where standard masked modeling hides random tokens, masking the penultimate token in augmented sequences directly targets next-item prediction, letting an encoder-only model extract strong sequential signal from minimal context — matching methods that need richer history.
This sits in the vault's recommender thread as a session-modeling architecture note. It shares the sequential structure matters lesson with Does conversation order matter for recommending items in dialogue?, and the do-more-with-less framing rhymes with the inductive-bias-over-capacity results elsewhere in the recommenders cluster.
Inquiring lines that use this note as a source 3
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
Related concepts in this collection 2
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Does conversation order matter for recommending items in dialogue?
Conversational recommendation systems typically ignore the sequence in which items are mentioned, treating dialogue as a bag of entities. But does the order itself carry predictive signal about what to recommend next?
shared lesson that sequential structure carries signal bag-of-features models discard
-
Do LLM movie recommenders actually personalize to individual users?
While LLMs excel at explaining recommendations, do they truly adapt to each user's preferences and taste? A 160-user study tests whether personalized prompting techniques can close the personalization gap.
adjacent recommender finding; SMM is the lightweight-architecture counterpoint to LLM-based recommendation
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Optimizing Encoder-Only Transformers for Session-Based Recommendation Systems
- Augmenting Netflix Search with In-Session Adapted Recommendations
- Multi-Task End-to-End Training Improves Conversational Recommendation
- Dynamically Expandable Graph Convolution for Streaming Recommendation
- Scalable Neural Contextual Bandit for Recommender Systems
- Preference Discerning with LLM-Enhanced Generative Retrieval
- Large Language Models as Zero-Shot Conversational Recommenders
- Learning to Ask Critical Questions for Assisting Product Search
Original note title
encoder-only transformers with penultimate-token masking capture single-session dependencies rivaling history-rich recommenders