SYNTHESIS NOTE

Do embedding eigenvectors organize taxonomy from coarse to fine?

Can we predict how embeddings encode taxonomic hierarchies by examining their spectral structure? This tests whether word co-occurrence statistics alone produce the observed hierarchical geometry in language models.

Synthesis note · 2026-05-28 · sourced from MechInterp

The hierarchical geometry of concept embeddings is not just present but ordered in a specific way. When you take the embedding Gram matrix and read off its leading eigenvectors, the first ones separate the broadest taxonomic branches; later eigenvectors split progressively finer sub-branches. The spectral organization is coarse-to-fine, and it tracks the WordNet hypernym tree level by level. This is a stronger claim than "the representation has hierarchical structure" — it specifies where in the spectrum each level of the taxonomy lives.

The pattern is what makes the underlying co-occurrence theory falsifiable rather than merely suggestive. A purely descriptive observation that embeddings cluster by category could be explained many ways; a derived prediction that the principal components encode the taxonomy from coarse to fine, confirmed across many sampled WordNet subtrees, is a tight fit between a statistical mechanism and an observed geometry. The eigenvalue ordering is the fingerprint: dominant variance carries the broad ontological cuts (animal vs. artifact), residual variance carries the fine ones (terrier vs. spaniel).

Why it matters: this gives interpretability a concrete, model-agnostic probe. If you want to test whether a representation space encodes a taxonomy in the way co-occurrence statistics predict, you check the spectral ordering against the tree depth — and the same probe applies to any embedding determined by co-occurrence, not just transformer internals. The counterpoint is that coarse-to-fine spectral order is exactly what generic kernel-decay assumptions produce, so finding it is evidence for the statistical account, not for a bespoke hierarchical computation.

Inquiring lines that read this note 45

This note is a source for these research framings, grouped by the broader line of inquiry each explores. Scan the bold lines of inquiry; follow any specific question forward.

Is embodied interaction necessary for language meaning and genuine agency?

When does architectural design matter more than raw model capacity?

How do embedding dimension limits constrain what concept models can represent?

What role does compression play in language model capability and generalization?

What compression explains why syntax fits in low-dimensional subspaces?

How does reasoning graph topology affect breakthrough insights and generalization?

Do grokking phases correspond to transitions between nesting levels?

Why do semantic similarity and task relevance diverge in vector embeddings?

What factors beyond surface content determine how readers extract meaning differently?

What structural factors drive popularity bias in recommendation systems?

Do language model representations contain causally steerable task-specific features?

Can steering vectors prove that representations are genuinely organized?

What limits mechanistic interpretability's ability to characterize models?

Does model scaling alone produce compositional generalization without symbolic mechanisms?

How should dialogue systems best leverage conversation history for retrieval?

Can hierarchical key point structures improve opinion summarization?

Do language models develop causal world models or rely on statistical patterns?

Why must world models be nested rather than flat and uniform?

How do transformer attention mechanisms implement memory and algorithmic functions?

How does iconicity detection work within static embeddings before any attention?

How do neural networks separate factual knowledge from reasoning abilities?

How do hierarchical knowledge layers capture different types of narrative information?

How can identical external performance mask different internal representations?

Do generic kernel-decay assumptions alone explain coarse-to-fine spectral ordering?

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

12 direct connections · 92 in 2-hop network ·medium cluster Open in graph ↗

Do embedding eigenvectors organize taxonomy from… Where does hierarchical structure in language mode… Does word frequency correlate with semantic abstra… Do language models use the hierarchical geometry t…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Where does hierarchical structure in language models come from? Do LLMs build hierarchical concept geometry through dedicated mechanisms, or does it emerge naturally from word co-occurrence patterns in training data? Understanding the source matters for interpreting what representations actually reveal about model computation.
this coarse-to-fine ordering is the specific prediction of the distributional mechanism
Does word frequency correlate with semantic abstraction? Explores whether LLMs' preference for high-frequency language also pulls them toward more abstract, general meanings—and whether this shapes how they handle expert knowledge.
both ground the abstraction structure of representations in WordNet-level statistical regularities
Do language models use the hierarchical geometry they inherit? Word2vec and Gemma share the same hierarchical spectral signature despite vastly different architectures and purposes. This suggests shared statistical origins, but leaves open whether the LLM actually recruits this structure for reasoning or simply inherits unused geometry.
grounds: the cross-model evidence that the coarse-to-fine spectral order is a statistical fingerprint, not a transformer-specific computation — the same probe applies to word2vec

Do embedding eigenvectors organize taxonomy from coarse to fine?

Inquiring lines that read this note 45

Related concepts in this collection 3

Related papers in this collection 8

Search by related questions 4