SYNTHESIS NOTE
Model Architecture and Internals Language, Text, and Discourse

Do language models use the hierarchical geometry they inherit?

Word2vec and Gemma share the same hierarchical spectral signature despite vastly different architectures and purposes. This suggests shared statistical origins, but leaves open whether the LLM actually recruits this structure for reasoning or simply inherits unused geometry.

Synthesis note · 2026-05-28 · sourced from MechInterp

The decisive move in the co-occurrence account of concept geometry is a cross-architecture comparison. The hierarchical splitting geometry is first derived and confirmed for word2vec embeddings across many WordNet subtrees. Then the same coarse-to-fine spectral signature is shown to extend "strikingly well" to Gemma 2B unembeddings. Two systems with entirely different objectives and training regimes — a shallow predict-context embedding and a large autoregressive transformer's output matrix — carry the same hierarchical fingerprint. If the structure were a functional artifact of how an LLM reasons, it should not appear, in the same form, in a model that does not reason at all.

This is the strongest available argument that the geometry is statistical, not functional: a shared signature across architectures points to a shared cause upstream of both — the co-occurrence statistics of the training text — rather than convergent functional design. Each word is characterized by discrete, continuous, and hierarchical attributes; words with similar attributes co-occur more often; and that alone gives rise to the geometric organization. Both models inherit it because both are, in different ways, fitting the same pairwise statistics.

Why it leaves a question open: the authors are explicit that such organization may be useful for function but is not driven by it — which leaves unresolved whether and where the LLM actually uses the hierarchical geometry it inherits. Shared structure proves common statistical origin; it does not prove the structure is inert in the transformer. Disentangling inherited-but-unused geometry from inherited-and-recruited geometry is the open problem this result sharpens rather than settles.

Inquiring lines that use this note as a source 4

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
12 direct connections · 93 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

word2vec and gemma unembeddings share the same hierarchical signature so structure is statistical not functional