Can geometric structure in representations exist without supporting functional mechanisms?
This explores whether a model can show clean, organized geometry in its internal representations — directions, hierarchies, coordinate systems — while that geometry isn't actually doing the work behind the model's behavior; the corpus says yes, and treats this gap as one of interpretability's central traps.
This explores whether geometric structure in a model's activations can be real and measurable yet functionally inert — present to a probe but not load-bearing for behavior. The corpus answers a fairly emphatic yes, and it turns the question into a warning about how we read neural networks. The sharpest case is the work on fractured, entangled representations: two networks can produce identical outputs while one has clean organization and the other is internally broken, with weight perturbations exposing tangled structure that can't transfer or recombine Can identical outputs hide broken internal representations?. Even more pointed for your question, a model can contain all the linearly decodable features a task needs — meaning a probe finds the geometry — while its internal organization is fundamentally fractured and quietly vulnerable to perturbation and distribution shift Can models be smart without organized internal structure?. Decodable geometry, in other words, is not proof of a working mechanism.
That gap is exactly why the corpus argues you can't settle the question by looking at representations alone. Representational analysis finds correlations without causation; you have to locate a candidate feature geometrically and then verify causally — by intervening — before you can claim the structure actually drives anything Can we understand LLM mechanisms with only representational analysis?. The geometry-without-function scenario is precisely what survives representational analysis but dies under causal testing.
There's a second, subtler way structure can appear without a dedicated mechanism: it can be a statistical byproduct. Hierarchical concept geometry in LLMs turns out to fall straight out of the spectral structure of word co-occurrence — no hierarchy-specific machinery required, the nested shape is just a mathematical consequence of corpus statistics Where does hierarchical structure in language models come from?. So 'no supporting mechanism' has two flavors here: structure that's broken (fractured) and structure that's epiphenomenal (emergent from data statistics rather than built by a circuit).
The corpus also marks the other pole, which is what makes the answer interesting rather than nihilistic. Sometimes geometry genuinely is functional. Pruning experiments show networks carve compositional tasks into isolated subnetworks where ablating one piece kills only its corresponding function — structure you can causally confirm Do neural networks naturally learn modular compositional structure?. Activation-space directions for reasoning verbosity are steerable: extract one vector and you actually shorten chain-of-thought, which means that direction is doing causal work, not just sitting there Can we steer reasoning toward brevity without retraining?. And the polar-coordinate encoding of syntax earns its keep by nearly doubling probing accuracy over distance-only methods How do language models encode syntactic relations geometrically?.
The quiet lesson threaded through all of these is that linear decodability — the thing we most often use to claim 'the structure is there' — is the unreliable witness. It predicts compositional success in some setups Can neural networks learn compositional skills without symbolic mechanisms? yet coexists with fractured internals in others Can models be smart without organized internal structure?. So the honest answer to your question: yes, geometric structure can absolutely exist without supporting function — which is exactly why a clean-looking probe result should make you reach for an intervention, not a conclusion.
Sources 8 notes
Networks trained with SGD reproduce outputs perfectly while having radically different internal structure than evolved networks, with weight perturbations revealing fractured, entangled representations that prevent transfer to novel contexts or creative recombination.
Models trained with SGD can contain all the linearly decodable features needed for a task while maintaining fundamentally broken internal organization. This makes them vulnerable to perturbation and distribution shift invisible to standard evaluation metrics.
Representational analysis alone identifies correlations without causation; causal analysis alone shows behavioral effects without explaining them. Only paired methods—locating candidate features representationally, then verifying causally—produce complete mechanistic claims.
LLM hierarchical representations arise as a direct mathematical consequence of corpus statistics, not from hierarchy-specific mechanisms. Spectral analysis of word co-occurrence matrices predicts and reproduces the same nested geometry found in trained embeddings and word2vec models.
Pruning experiments reveal that neural networks implement compositional subroutines in isolated subnetworks, with ablations affecting only their corresponding function. Pretraining substantially increases the consistency and reliability of this modular structure across architectures and domains.
Activation-Steered Compression extracts a single vector from 50 paired examples to reduce chain-of-thought length by 67% while maintaining accuracy and achieving 2.73x speedup. The method is training-free and generalizes across model sizes and domains.
The Polar Probe shows LLMs represent syntactic type and direction through both distance and angular position between embeddings, nearly doubling accuracy over distance-only methods. This demonstrates neural networks spontaneously learn structured, symbolic-compatible geometry.
Standard MLPs achieve compositional generalization through data and model scaling alone, without architectural modifications, provided the training distribution sufficiently covers combinations of task modules. Linear decodability of constituents from hidden activations reliably predicts success.