SYNTHESIS NOTE
Model Architecture and Internals Recommender Systems

Why does dot product beat MLP-based similarity in practice?

Neural Collaborative Filtering theory suggests MLPs should outperform dot products as universal approximators. But what explains the empirical gap, and what role do data scale and deployment constraints play?

Synthesis note · 2026-05-03 · sourced from Recommenders Architectures
What breaks when specialized AI models reach real users?

Neural Collaborative Filtering popularized replacing the dot product between user and item embeddings with a learned MLP, on the theory that an MLP — a universal function approximator — should subsume the dot product as a special case. Rendle and colleagues revisit the experiments and show two non-obvious results.

First, with proper hyperparameter tuning, the simple dot product substantially outperforms the MLP-based similarity. The original NCF gain came from undertuning the dot-product baseline, not from MLP expressiveness. Second, even though an MLP can in theory approximate any function, learning a dot product with an MLP requires both a large model and a large training set — the inductive bias of MLPs makes the dot-product structure expensive to recover from data.

The practical bite is in inference. Dot products admit Maximum Inner Product Search algorithms that retrieve top-K items in sublinear time over millions of items. MLP similarities require a forward pass per (user, item) pair, which is intractable at production scale. The paper concludes that MLPs as embedding combiners should be "used with care" — that the modern DNN architectures most competitive in NLP (transformers) and vision (resnets) all use dot products in their output layers reinforces the point. Universal approximation does not mean universal good choice; the inductive bias of the operator interacts with data scale and serving constraints.

Inquiring lines that use this note as a source 10

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 4

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
13 direct connections · 108 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

MLP-based similarity underperforms dot product despite being a universal function approximator — inductive bias matters more than capacity