Can neural networks explore efficiently at recommendation scale?
Exploration—discovering unknown user preferences—normally requires expensive posterior uncertainty estimates. Can a neural architecture make Thompson sampling practical for real-world recommenders without prohibitive computational cost?
Supervised neural networks form the backbone of most recommenders, but they only exploit recognized user interests. Discovering unknown user preferences requires exploration — and the standard exploration framework (contextual bandits with Thompson sampling) requires posterior uncertainty estimates, which are computationally prohibitive for large neural networks at recommendation scale.
The Zhu et al. proposal is the Epistemic Neural Recommendation (ENR) architecture, an epistemic neural network designed to enable Thompson sampling at scale. Epistemic neural networks separate aleatoric uncertainty (irreducible noise in outputs) from epistemic uncertainty (uncertainty about the model's parameters). The latter is what's needed for Thompson sampling: sample a parameter setting from the posterior, choose actions according to that setting, observe outcomes, update.
Empirically, ENR significantly boosts click-through rates and user ratings by at least 9% and 6% respectively compared to state-of-the-art neural contextual bandit algorithms. It achieves equivalent performance with at least 29% fewer user interactions than the best-performing baseline. Computationally, it demands orders of magnitude fewer resources than other neural contextual bandit baselines — moving Thompson-sampling-based exploration from research-only to production-feasible.
The general principle: when a Bayesian technique seems too expensive at scale, ask whether the expensive part is genuinely necessary or whether a structural approximation captures what's needed. Epistemic networks make a focused commitment to estimating only the parameter uncertainty Thompson sampling actually uses, dropping the rest. The architectural simplification is what unlocks scale.
Inquiring lines that use this note as a source 14
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Does universal approximation guarantee help with finite recommendation data?
- How do neural networks extend contextual bandits beyond linear reward assumptions?
- Why do real-world platforms need inductive learning for streaming recommendation systems?
- What real-world applications have context distributions that enable exploration-free bandits?
- How does covariate diversity compare to the exploration assumptions of LinUCB?
- Why does exploration quality matter more than learner network depth?
- How does active learning reduce queries needed for user preference inference?
- Does context diversity ever make active exploration unnecessary in bandits?
- Can linear bandit methods scale beyond their original reward assumptions?
- Why should bandit algorithms condition exploration on time-of-period as well as user state?
- How many particles and iterations does optimal expert discovery require?
- How can insert-expansion techniques help users discover their own preferences?
- Can historical and batch exploration be implemented with the same algorithmic mechanism?
- Can we cheaply estimate which samples are currently most informative?
Related concepts in this collection 4
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Can bandit algorithms beat collaborative filtering for news?
News recommendation faces constant content churn and cold-start users—settings where traditional collaborative filtering struggles. Can a contextual bandit approach like LinUCB explicitly balance exploration and exploitation better than static methods?
extends: ENN scales the LinUCB framework beyond linear-reward assumptions while preserving the bandit framing
-
When can greedy bandits skip exploration entirely?
Under what conditions does natural randomness in incoming contexts eliminate the need for active exploration in contextual bandits? This matters for high-stakes domains like medicine where exploration carries real costs.
tension with: ENN scales exploration; greedy-first avoids it under context diversity — design choice depends on context-distribution structure
-
Can implicit feedback reveal both preference and confidence?
When users take implicit actions like purchases or watches, do those signals carry two separable pieces of information: what they prefer and how certain we should be? Explicit ratings can't make that distinction.
complements: epistemic uncertainty in ENN is the bandit-style confidence signal that exploration acts on
-
Why do academic recommenders fail when deployed in production?
Academic recommendation models assume static test sets known at training time, but real platforms continuously receive new users, items, and interactions. Understanding this gap reveals what production systems actually need.
complements: bandit framing assumes inductive learning; ENR is the production-scale exploration primitive for inductive recommenders
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Scalable Neural Contextual Bandit for Recommender Systems
- Rec-R1: Bridging Generative Large Language Models and User-Centric Recommendation Systems via Reinforcement Learning
- Intrinsically Motivated Graph Exploration Using Network Theories of Human Curiosity
- Neural Collaborative Filtering vs. Matrix Factorization Revisited
- Variational Autoencoders for Collaborative Filtering
- Collaborative Deep Learning for Recommender Systems
- Reconciling the accuracy-diversity trade-off in recommendations
- A Contextual-Bandit Approach to Personalized News Article Recommendation
Original note title
scalable neural contextual bandits enable sample-efficient exploration via epistemic neural networks supporting Thompson sampling at scale