SYNTHESIS NOTE

Topics›Cognitive Models Latent›this note

How do language models encode syntactic relations geometrically?

Do LLM embeddings use distance alone or also direction to represent syntax? Understanding whether neural networks can spontaneously develop symbolic-compatible geometric structures.

Synthesis note · 2026-02-23 · sourced from Cognitive Models Latent

The symbol-vector divide has been a core challenge in cognitive science since Smolensky (1987): syntactic trees are symbolic structures that seem incompatible with the vectorial representations of neural networks. The Structural Probe (Hewitt & Manning 2019) made partial progress — it showed that the existence of syntactic links between words is encoded in the distance between their corresponding embeddings. But whether the type and direction of syntactic relations were represented remained unknown.

The Polar Probe answers this: syntactic relations are coded by the relative direction between nearby embeddings, not just their distance. Using both distance and direction (a polar coordinate system), the Polar Probe recovers syntactic relation types and directions with nearly 2x the accuracy of the distance-only Structural Probe.

Three key findings:

Complete syntactic encoding. The polar coordinate system captures existence, type, AND direction of syntactic relations — the full specification of a dependency tree is encoded in the geometry of LLM activations.
Low-dimensional subspace. This encoding exists in a low-dimensional subspace of intermediate layers across many LLMs, and becomes increasingly precise in frontier models. This is not a brute-force representation but a compressed, structured one.
Nested consistency. Similar syntactic relations are coded similarly across nested levels of syntactic trees. The encoding is not ad hoc for each syntactic instance but systematic — a genuine coordinate system.

The resolution of the symbol-vector divide is significant: LLMs don't need explicit symbolic mechanisms to represent symbolic structures. They spontaneously learn a geometry that explicitly represents the main symbolic structures of linguistic theory. This doesn't mean LLMs "understand" syntax in a human sense, but it demonstrates that connectionist architectures can natively develop symbolic-compatible representations — the two paradigms are not incompatible.

This connects to Do transformer static embeddings actually encode semantic meaning? at a different structural level: static embeddings encode semantic features, while intermediate activations encode syntactic relations. Together they suggest LLM representations are far richer and more structured than the "statistical patterns" dismissal implies.

Inquiring lines that read this note 54

This note is a source for these research framings, grouped by the broader line of inquiry each explores. Scan the bold lines of inquiry; follow any specific question forward.

When does architectural design matter more than raw model capacity?

How do embedding dimension limits constrain what concept models can represent?

What limits mechanistic interpretability's ability to characterize models?

Do language models understand semantics or rely on pattern matching?

What role does compression play in language model capability and generalization?

What compression explains why syntax fits in low-dimensional subspaces?

Why do semantic similarity and task relevance diverge in vector embeddings?

Why do language models struggle with implicit discourse relations?

What other semantic relations benefit from explicit surface markers in text?

Do language models learn genuine linguistic structure or just surface patterns?

Can AI-generated outputs constitute genuine knowledge or valid claims?

What happens when you tightly couple two representations together?

What articulatory information do speech signals carry that text cannot?

Do language model representations contain causally steerable task-specific features?

Do language models perform faithful symbolic reasoning independent of semantic grounding?

Do language models develop causal world models or rely on statistical patterns?

Does model scaling alone produce compositional generalization without symbolic mechanisms?

How do transformer attention mechanisms implement memory and algorithmic functions?

How does iconicity detection work within static embeddings before any attention?

Is embodied interaction necessary for language meaning and genuine agency?

What factors beyond surface content determine how readers extract meaning differently?

What spectral signatures distinguish hierarchy-driven geometry from corpus-driven geometry?

What critical LLM failures do standard benchmarks hide?

Can language models execute iterative numerical methods in latent space?

Does recurrence enable reasoning capabilities that fixed-depth transformers cannot achieve?

Does Gemma's transformer explicitly exploit the inherited hierarchical geometry?

Should GUI agents use structured representations instead of raw pixels?

How does serializing screen layout to text preserve spatial relationships?

How does reasoning graph topology affect breakthrough insights and generalization?

What role does embedding space geometry play in multi-hop reasoning?

Related concepts in this collection 5

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

13 direct connections · 96 in 2-hop network ·medium cluster Open in graph ↗

How do language models encode syntactic relation… Do transformer static embeddings actually encode s… Why do neural networks fail at compositional gener… Can neural networks learn compositional skills wit… Do neural networks naturally learn modular composi… Where does hierarchical structure in language mode…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Do transformer static embeddings actually encode semantic meaning? Explores whether the fixed word embeddings that enter transformer networks contain rich semantic information or serve only as shallow placeholders. This addresses a longstanding debate in philosophy of language about whether word meanings are stored or constructed.
semantic features in static embeddings complement syntactic features in intermediate activations
Why do neural networks fail at compositional generalization? Exploring whether the binding problem from neuroscience explains neural networks' inability to systematically generalize. The binding problem has three aspects—segregation, representation, and composition—each creating distinct failure modes in how networks handle structured information.
polar coordinate encoding is evidence against the strong version: systematic structure IS represented, even if binding problems remain at the compositional level
Can neural networks learn compositional skills without symbolic mechanisms? Do neural networks need explicit symbolic architecture to compose learned concepts, or can scaling alone enable compositional generalization? This asks whether compositionality is an architectural feature or an emergent property of scale.
convergent: symbolic-like structure emerges without explicit symbolic mechanisms
Do neural networks naturally learn modular compositional structure? Explores whether neural networks decompose compositional tasks into distinct subroutines without explicit symbolic design. This challenges the longstanding view that neural networks are fundamentally non-compositional.
related: modular structure emerges from training
Where does hierarchical structure in language models come from? Do LLMs build hierarchical concept geometry through dedicated mechanisms, or does it emerge naturally from word co-occurrence patterns in training data? Understanding the source matters for interpreting what representations actually reveal about model computation.
contrasts: this note reads symbolic-compatible geometry as spontaneously learned, but distributional theory shows such structure can be a co-occurrence shadow not a learned mechanism

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

a polar coordinate system in llm activations encodes both type and direction of syntactic relations — resolving the symbol-vector divide

How do language models encode syntactic relations geometrically?

Inquiring lines that read this note 54

Related concepts in this collection 5

Related papers in this collection 8

Search by related questions 4