Does the same spectral signature appear across different embedding models?
This explores whether the spectral structure found inside embeddings — the coarse-to-fine ordering of eigenvectors that mirrors a taxonomy — recurs across models built with different objectives, and what that recurrence would mean.
This explores whether the spectral structure found inside embeddings — the coarse-to-fine ordering of eigenvectors — shows up across models trained for entirely different purposes. The corpus gives an unusually clean answer: yes, and the most striking evidence is that word2vec embeddings and Gemma 2B unembeddings share an identical coarse-to-fine spectral signature across WordNet taxonomies Do language models use the hierarchical geometry they inherit?. These are very different models with very different goals, so the shared structure can't be a design choice either one made. It has to come from something both training sets have in common — the co-occurrence statistics of text itself.
That's the deeper payoff hiding in the question. The leading eigenvectors of an embedding's Gram matrix don't carve up meaning randomly; they split broad taxonomic branches first and progressively finer sub-branches after, tracking the WordNet hypernym tree level by level Do embedding eigenvectors organize taxonomy from coarse to fine?. When two unrelated models reproduce that same ordering, the implication is that hierarchy isn't something models invent because hierarchy is useful — it's inherited, almost passively, from the statistical shape of language. The signature is a fingerprint of the data, not of the architecture.
This convergence rhymes with a broader pattern the corpus keeps surfacing: different models, left to their own devices, end up in the same place. Across 70+ models and 26K open-ended queries, researchers found an "Artificial Hivemind" — independent models generating strikingly similar outputs because they draw on overlapping training data and alignment procedures Do different AI models actually produce diverse outputs?. The spectral-signature finding is the geometric, under-the-hood version of that behavioral observation. Same cause (shared data statistics), same outcome (convergence), seen at two different layers of the system.
There's a useful tension worth holding, though. Embeddings encode association from co-occurrence rather than meaning chosen for a task Do vector embeddings actually measure task relevance?, which is exactly why a data-derived signature would transfer across models — but it's also why standard spectral and linear analysis can mislead. One line of work shows that PCA, RSA, and linear methods are systematically biased toward simple, linear features and under-represent equally real nonlinear ones Do standard analysis methods hide nonlinear features in neural networks?. So a shared spectral signature may be real and yet incomplete: the part of the geometry that's easy to see in eigenvectors is the part most likely to look the same everywhere, while model-specific structure could live in places the spectrum doesn't expose.
If you want to go further, two adjacent doorways reframe what the signature is for. Static (pre-attention) embeddings already carry rich semantic content — valence, concreteness, iconicity — meaning the inherited structure is loaded before any model-specific computation runs Do transformer static embeddings actually encode semantic meaning?. And there are hard ceilings on what embedding geometry can represent at all: for any dimension, communication-complexity theory bounds the document combinations an embedding can return Do embedding dimensions fundamentally limit retrievable document combinations?. A signature shared across models tells you what they have in common; these limits tell you what none of them can escape.
Sources 7 notes
Word2vec embeddings and Gemma 2B unembeddings share identical coarse-to-fine spectral signatures across WordNet taxonomies. Since these models have entirely different objectives, the shared structure must originate from training text statistics rather than convergent functional needs.
Leading eigenvectors of embedding Gram matrices separate broad taxonomic branches first, then progressively finer sub-branches—a coarse-to-fine spectral order that tracks the WordNet hypernym tree level by level, confirming predictions from co-occurrence statistics.
INFINITY-CHAT analyzed 70+ models across 26K open-ended queries and found an "Artificial Hivemind" effect: models independently generate strikingly similar or identical responses due to overlapping training data and alignment procedures, undermining the diversity benefits of model ensembles.
Embeddings encode co-occurrence patterns, making semantically close but role-distinct concepts highly similar. This works in simple demos but fails in production where underspecified queries have many wrong-but-associated candidates.
PCA, linear regression, and RSA over-represent simple linear features while under-representing equally important nonlinear features. Homomorphic encryption demonstrates that networks can compute perfectly well with no interpretable activation structure, proving representation patterns and computation can be entirely decoupled.
Clustering analysis of RoBERTa embeddings reveals sensitivity to five psycholinguistic measures including valence, concreteness, iconicity, and taboo. This demonstrates that static embeddings function as genuine lexical entries containing semantic content before self-attention operates.
Communication complexity theory proves that for any embedding dimension d, there exists a maximum number of top-k document combinations that can be returned as results. Even embeddings optimized directly on test data hit this polynomial limit, demonstrated on trivially simple retrieval tasks.