Can a linear model beat deep collaborative filtering?

Inquiring lines that read this note 39

This note is a source for these research framings, grouped by the broader line of inquiry each explores. Scan the bold lines of inquiry; follow any specific question forward.

What structural factors drive popularity bias in recommendation systems?

How can LLM recommenders match or exceed collaborative filtering performance?

Can embedding-based integration preserve both LLM text strength and collaborative filtering signal?
Why do LLM recommenders underperform item-only collaborative filtering baselines?
Why does inductive bias outweigh model capacity in recommender systems?
Why do embedding-based recommendation models fail with sparse user history?
What non-linear patterns do autoencoders discover that matrix factorization misses?
Can structural priors outperform raw model capacity in collaborative filtering?
Can simpler collaborative filtering models outperform deep architectures?
How does per-user sparsity influence likelihood choice for recommendations?
Why do multinomial likelihoods outperform Gaussian models for recommendation?
Can hypernetworks generate recommendation parameters more efficiently than retraining full models?

What dimensions of recommendation quality do standard metrics miss?

Why do linear hybrid models fail to capture user-item relationships?

Can graph structure and relationships fundamentally improve recommendation systems?

Why do semantic similarity and task relevance diverge in vector embeddings?

Why do dual-encoder embeddings fail to capture task-relevant recommendations despite semantic similarity?

How do multi-agent systems achieve genuine cooperation and reasoning?

How does this compare to trained autoencoder approaches for thought sharing?

How does sequence length affect sparsity tolerance in models?

What limits mechanistic interpretability's ability to characterize models?

Which computational strategies best support reasoning in language models?

Why do singular value experts compose better than low-rank adapter subspaces?

Why do continual learning scenarios trigger catastrophic forgetting and interference?

Can autoencoders act as associative memory systems like Hopfield networks?

How can identical external performance mask different internal representations?

Does AI fluency substitute for verifiable accuracy in human judgment?

What does a human-parseable framework for deep learning look like?

Does recurrence enable reasoning capabilities that fixed-depth transformers cannot achieve?

What tasks does recurrent depth solve that feedforward models cannot?

Related concepts in this collection 4

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

13 direct connections · 89 in 2-hop network ·medium cluster Open in graph ↗

Can a linear model beat deep collaborative filte… Can simpler models beat deep networks for recommen… Why does dot product beat MLP-based similarity in … Can MLPs learn to match dot product similarity in … Why does multinomial likelihood work better for cl…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Can simpler models beat deep networks for recommendation systems? Does removing hidden layers and constraining self-similarity create a more effective collaborative filtering approach than deep autoencoders? This challenges the assumption that architectural depth drives performance.
extends: paired re-statement of the same EASE/easer result emphasizing the precision-matrix-vs-covariance distinction
Why does dot product beat MLP-based similarity in practice? Neural Collaborative Filtering theory suggests MLPs should outperform dot products as universal approximators. But what explains the empirical gap, and what role do data scale and deployment constraints play?
complements: paired anti-deep-CF lesson — the right inductive bias matters more than the universal approximation guarantee
Can MLPs learn to match dot product similarity in practice? Universal approximation theory suggests MLPs should learn any similarity function, including dot product. But does this theoretical promise hold up when training on real, finite datasets with practical constraints?
complements: capacity-vs-bias point at the similarity layer; easer makes it at the architecture-depth layer
Why does multinomial likelihood work better for click prediction? Explores whether the choice of likelihood function—multinomial versus Gaussian or logistic—affects recommendation performance, and what structural properties make one better suited to modeling user clicks.
complements: another simpler-with-the-right-prior result — likelihood choice matters more than depth

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Embarrassingly Shallow Autoencoders for Sparse Data*0.88 match · arxiv ↗
Variational Autoencoders for Collaborative Filtering0.86 match · arxiv ↗
Collaborative Deep Learning for Recommender Systems0.84 match · arxiv ↗
Neural Collaborative Filtering vs. Matrix Factorization Revisited0.84 match · arxiv ↗
Neural Collaborative Filtering0.82 match · arxiv ↗
GHRS: Graph-based Hybrid Recommendation System with Application to Movie Recommendation0.81 match · arxiv ↗
Recommendation as Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5)0.81 match · arxiv ↗
Wide & Deep Learning for Recommender Systems0.81 match · arxiv ↗

Search by related questions 4

Suggested questions this note speaks to — click to search the collection, or type your own.