INQUIRING LINE

Inquiring lines›How do language models construct a…›How does AI persuasion undermine h…›What limits mechanistic interpreta…›this inquiring line

A model can look beautifully organized inside — while that organization does absolutely nothing to drive its actual behavior.

Can geometric structure in representations exist without supporting functional mechanisms?

This explores whether a model can show clean, organized geometry in its internal representations — directions, hierarchies, coordinate systems — while that geometry isn't actually doing the work behind the model's behavior; the corpus says yes, and treats this gap as one of interpretability's central traps.

This explores whether geometric structure in a model's activations can be real and measurable yet functionally inert — present to a probe but not load-bearing for behavior. The corpus answers a fairly emphatic yes, and it turns the question into a warning about how we read neural networks. The sharpest case is the work on fractured, entangled representations: two networks can produce identical outputs while one has clean organization and the other is internally broken, with weight perturbations exposing tangled structure that can't transfer or recombine Can identical outputs hide broken internal representations?. Even more pointed for your question, a model can contain all the linearly decodable features a task needs — meaning a probe finds the geometry — while its internal organization is fundamentally fractured and quietly vulnerable to perturbation and distribution shift Can models be smart without organized internal structure?. Decodable geometry, in other words, is not proof of a working mechanism.

That gap is exactly why the corpus argues you can't settle the question by looking at representations alone. Representational analysis finds correlations without causation; you have to locate a candidate feature geometrically and then verify causally — by intervening — before you can claim the structure actually drives anything Can we understand LLM mechanisms with only representational analysis?. The geometry-without-function scenario is precisely what survives representational analysis but dies under causal testing.

There's a second, subtler way structure can appear without a dedicated mechanism: it can be a statistical byproduct. Hierarchical concept geometry in LLMs turns out to fall straight out of the spectral structure of word co-occurrence — no hierarchy-specific machinery required, the nested shape is just a mathematical consequence of corpus statistics Where does hierarchical structure in language models come from?. So 'no supporting mechanism' has two flavors here: structure that's broken (fractured) and structure that's epiphenomenal (emergent from data statistics rather than built by a circuit).

The corpus also marks the other pole, which is what makes the answer interesting rather than nihilistic. Sometimes geometry genuinely is functional. Pruning experiments show networks carve compositional tasks into isolated subnetworks where ablating one piece kills only its corresponding function — structure you can causally confirm Do neural networks naturally learn modular compositional structure?. Activation-space directions for reasoning verbosity are steerable: extract one vector and you actually shorten chain-of-thought, which means that direction is doing causal work, not just sitting there Can we steer reasoning toward brevity without retraining?. And the polar-coordinate encoding of syntax earns its keep by nearly doubling probing accuracy over distance-only methods How do language models encode syntactic relations geometrically?.

The quiet lesson threaded through all of these is that linear decodability — the thing we most often use to claim 'the structure is there' — is the unreliable witness. It predicts compositional success in some setups Can neural networks learn compositional skills without symbolic mechanisms? yet coexists with fractured internals in others Can models be smart without organized internal structure?. So the honest answer to your question: yes, geometric structure can absolutely exist without supporting function — which is exactly why a clean-looking probe result should make you reach for an intervention, not a conclusion.

Sources 8 notes

Can identical outputs hide broken internal representations?

Networks trained with SGD reproduce outputs perfectly while having radically different internal structure than evolved networks, with weight perturbations revealing fractured, entangled representations that prevent transfer to novel contexts or creative recombination.

Can models be smart without organized internal structure?

Models trained with SGD can contain all the linearly decodable features needed for a task while maintaining fundamentally broken internal organization. This makes them vulnerable to perturbation and distribution shift invisible to standard evaluation metrics.

Can we understand LLM mechanisms with only representational analysis?

Representational analysis alone identifies correlations without causation; causal analysis alone shows behavioral effects without explaining them. Only paired methods—locating candidate features representationally, then verifying causally—produce complete mechanistic claims.

Where does hierarchical structure in language models come from?

LLM hierarchical representations arise as a direct mathematical consequence of corpus statistics, not from hierarchy-specific mechanisms. Spectral analysis of word co-occurrence matrices predicts and reproduces the same nested geometry found in trained embeddings and word2vec models.

Do neural networks naturally learn modular compositional structure?

Pruning experiments reveal that neural networks implement compositional subroutines in isolated subnetworks, with ablations affecting only their corresponding function. Pretraining substantially increases the consistency and reliability of this modular structure across architectures and domains.

Show all 8 sources

Can we steer reasoning toward brevity without retraining?

Activation-Steered Compression extracts a single vector from 50 paired examples to reduce chain-of-thought length by 67% while maintaining accuracy and achieving 2.73x speedup. The method is training-free and generalizes across model sizes and domains.

How do language models encode syntactic relations geometrically?

The Polar Probe shows LLMs represent syntactic type and direction through both distance and angular position between embeddings, nearly doubling accuracy over distance-only methods. This demonstrates neural networks spontaneously learn structured, symbolic-compatible geometry.

Can neural networks learn compositional skills without symbolic mechanisms?

Standard MLPs achieve compositional generalization through data and model scaling alone, without architectural modifications, provided the training distribution sufficiently covers combinations of task modules. Linear decodability of constituents from hidden activations reliably predicts success.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Break It Down: Evidence for Structural Compositionality in Neural Networks4.34 match · arxiv ↗
Scaling can lead to compositional generalization2.66 match · arxiv ↗
From Frege to chatGPT: Compositionality in language, cognition, and deep neural networks2.56 match · arxiv ↗
Computational structuralism: Toward a formal theory of meaning in the age of digital intelligence2.44 match · arxiv ↗
Hierarchical Concept Geometry in Language Models Emerges from Word Co-occurrence1.79 match · arxiv ↗
Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control1.67 match · arxiv ↗
Faith and Fate: Limits of Transformers on Compositionality1.67 match · arxiv ↗
How do Transformers Learn Implicit Reasoning?1.66 match · arxiv ↗

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a mechanistic interpretability researcher re-testing the claim that geometric structure in neural network representations can exist and be linearly decodable without causally supporting model behavior. The question remains open: what conditions permit 'geometry without function'?

What a curated library found — and when (dated claims, not current truth):
Findings span 2023–2026; treat as perishable constraints to re-examine.
• Fractured, entangled representations can produce identical outputs to clean ones, yet internal structure fails under perturbation and does not transfer (2025).
• Linear decodability of a feature does NOT guarantee the feature causally drives behavior; representational analysis alone cannot confirm functional load (2025).
• Hierarchical concept geometry in LLMs emerges as a statistical byproduct of word co-occurrence spectral structure, not from a dedicated mechanism (2026).
• Activation-space steering (e.g., extracting a chain-of-thought verbosity vector) does causally shorten reasoning, proving some geometric directions carry functional weight (2025).
• Compositional generalization can emerge from scaling data and model size without explicit compositional machinery, though modularity in pruned subnetworks shows isolable causal structure in some tasks (2025).

Anchor papers (verify; mind their dates):
• arXiv:2505.11581 (2025) — Fractured Entangled Representations
• arXiv:2507.07207 (2025) — Mechanistic Indicators of Understanding in LLMs
• arXiv:2507.04742 (2025) — Activation Steering for Chain-of-Thought Compression
• arXiv:2605.23821 (2026) — Hierarchical Concept Geometry

Your task:
(1) RE-TEST EACH CONSTRAINT. For fractured representations, have newer scaling regimes or training methods (e.g., constitutional AI, synthetic data, mixture-of-experts) *eliminated* the gap between decodability and robustness? For epiphenomenal geometry, does intervention on corpus-derived structure still fail to change behavior in recent LLMs? Separate the durable question (when is geometry inert?) from resolved limitations (modern models may have denser causal coupling).
(2) Surface the strongest work from the last ~6 months contradicting or superseding the "geometry without function" hypothesis — especially if newer mechanistic methods (e.g., SAE-based circuit discovery, causality-first probing) find geometry *always* couples to function under sufficient resolution.
(3) Propose 2 research questions assuming the regime has shifted: (a) Do emergent multimodal or reasoning-specific geometries show different structure–function decoupling than language geometry? (b) Can a single metric unify decodability + robustness to replace the current two-step verification?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

A model can look beautifully organized inside — while that organization does absolutely nothing to drive its actual behavior.

Related lines of inquiry

Sources 8 notes

Papers this line draws on 8