Can neural networks explore efficiently at recommendation scale?

Exploration—discovering unknown user preferences—normally requires expensive posterior uncertainty estimates. Can a neural architecture make Thompson sampling practical for real-world recommenders without prohibitive computational cost?

Synthesis note · 2026-05-03 · sourced from Recommenders Architectures

Supervised neural networks form the backbone of most recommenders, but they only exploit recognized user interests. Discovering unknown user preferences requires exploration — and the standard exploration framework (contextual bandits with Thompson sampling) requires posterior uncertainty estimates, which are computationally prohibitive for large neural networks at recommendation scale.

The Zhu et al. proposal is the Epistemic Neural Recommendation (ENR) architecture, an epistemic neural network designed to enable Thompson sampling at scale. Epistemic neural networks separate aleatoric uncertainty (irreducible noise in outputs) from epistemic uncertainty (uncertainty about the model's parameters). The latter is what's needed for Thompson sampling: sample a parameter setting from the posterior, choose actions according to that setting, observe outcomes, update.

Empirically, ENR significantly boosts click-through rates and user ratings by at least 9% and 6% respectively compared to state-of-the-art neural contextual bandit algorithms. It achieves equivalent performance with at least 29% fewer user interactions than the best-performing baseline. Computationally, it demands orders of magnitude fewer resources than other neural contextual bandit baselines — moving Thompson-sampling-based exploration from research-only to production-feasible.

The general principle: when a Bayesian technique seems too expensive at scale, ask whether the expensive part is genuinely necessary or whether a structural approximation captures what's needed. Epistemic networks make a focused commitment to estimating only the parameter uncertainty Thompson sampling actually uses, dropping the rest. The architectural simplification is what unlocks scale.

Inquiring lines that read this note 14

This note is a source for these research framings, grouped by the broader line of inquiry each explores. Scan the bold lines of inquiry; follow any specific question forward.

How can LLM recommenders match or exceed collaborative filtering performance?

Can alternative training methods improve on supervised fine-tuning for language models?

How do neural networks extend contextual bandits beyond linear reward assumptions?

Can graph structure and relationships fundamentally improve recommendation systems?

Why do real-world platforms need inductive learning for streaming recommendation systems?

Why does reinforcement learning suppress output diversity compared to supervised fine-tuning?

How does example difficulty affect learning efficiency in language models?

Why does exploration quality matter more than learner network depth?

How should iterative research systems allocate reasoning per search step?

Which computational strategies best support reasoning in language models?

How many particles and iterations does optimal expert discovery require?

How should dialogue recommender systems manage conversation history and state?

How can insert-expansion techniques help users discover their own preferences?

What makes weaker teacher models effective for stronger student training?

Can we cheaply estimate which samples are currently most informative?

Related concepts in this collection 4

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

13 direct connections · 103 in 2-hop network ·medium cluster Open in graph ↗

Can neural networks explore efficiently at recom… Can bandit algorithms beat collaborative filtering… When can greedy bandits skip exploration entirely? Can implicit feedback reveal both preference and c… Why do academic recommenders fail when deployed in…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Can bandit algorithms beat collaborative filtering for news? News recommendation faces constant content churn and cold-start users—settings where traditional collaborative filtering struggles. Can a contextual bandit approach like LinUCB explicitly balance exploration and exploitation better than static methods?
extends: ENN scales the LinUCB framework beyond linear-reward assumptions while preserving the bandit framing
When can greedy bandits skip exploration entirely? Under what conditions does natural randomness in incoming contexts eliminate the need for active exploration in contextual bandits? This matters for high-stakes domains like medicine where exploration carries real costs.
tension with: ENN scales exploration; greedy-first avoids it under context diversity — design choice depends on context-distribution structure
Can implicit feedback reveal both preference and confidence? When users take implicit actions like purchases or watches, do those signals carry two separable pieces of information: what they prefer and how certain we should be? Explicit ratings can't make that distinction.
complements: epistemic uncertainty in ENN is the bandit-style confidence signal that exploration acts on
Why do academic recommenders fail when deployed in production? Academic recommendation models assume static test sets known at training time, but real platforms continuously receive new users, items, and interactions. Understanding this gap reveals what production systems actually need.
complements: bandit framing assumes inductive learning; ENR is the production-scale exploration primitive for inductive recommenders

Can neural networks explore efficiently at recommendation scale?

Inquiring lines that read this note 14

Related concepts in this collection 4

Related papers in this collection 8

Search by related questions 4