Does cosine similarity actually measure embedding similarity?
Cosine similarity is ubiquitous for comparing learned embeddings, but does it reliably capture semantic closeness? This work investigates whether regularization during training makes cosine scores arbitrary and unstable.
Cosine similarity is the default tool for quantifying semantic similarity between learned embeddings, on the intuition that direction matters more than norm. This paper shows that intuition is unsafe. Using regularized linear (matrix-factorization) models where closed-form solutions allow analysis, it derives that cosine similarities can be arbitrary and therefore meaningless: for some models they are not even unique, and for others they are implicitly controlled by the regularization applied during training. Since deep models combine multiple regularizations with implicit and unintended effects, taking cosine similarities of their embeddings can render results opaque and possibly arbitrary.
The keeper is a methodological caution with teeth: the same embeddings can produce different "similarities" depending on regularization the practitioner never explicitly chose for similarity, so a cosine score is not a stable, model-independent measure of semantic closeness. The paper outlines alternatives and urges not using cosine blindly.
This sharpens the vault's embedding-geometry caveats. It is the regularization-dependence complement to Why can't cosine space retrievers distinguish word order? (geometry-dependence) and underwrites the production-RAG warning in Do vector embeddings actually measure task relevance?: cosine over learned embeddings is doubly unreliable — wrong target (association) and unstable measure (regularization-controlled).
Inquiring lines that use this note as a source 3
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
Related concepts in this collection 3
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Why can't cosine space retrievers distinguish word order?
Dense retrievers using unit-sphere cosine spaces struggle to capture non-commutative linguistic structures like negation and role reversal. Understanding this geometric constraint explains why training fixes have limited reach in compositional retrieval.
geometry-dependence; this adds regularization-dependence
-
Do vector embeddings actually measure task relevance?
Vector embeddings rank semantic similarity, but RAG systems need topical relevance. When these diverge—as with king/queen versus king/ruler—does similarity-based retrieval fail in production?
cosine over embeddings is wrong target and unstable measure
-
Why does dot product beat MLP-based similarity in practice?
Neural Collaborative Filtering theory suggests MLPs should outperform dot products as universal approximators. But what explains the empirical gap, and what role do data scale and deployment constraints play?
adjacent caution on naive similarity functions over embeddings
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Is Cosine-Similarity of Embeddings Really About Similarity?
- Problems with Cosine as a Measure of Embedding Similarity for High Frequency Words
- Training for Compositional Sensitivity Reduces Dense Retrieval Generalization
- On the Theoretical Limitations of Embedding-Based Retrieval
- Neural Collaborative Filtering vs. Matrix Factorization Revisited
- Semantic Structure in Large Language Model Embeddings
- LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
- Wide & Deep Learning for Recommender Systems
Original note title
cosine similarity of learned embeddings can be arbitrary and meaningless because it is implicitly controlled by regularization