How do functional features differ from representational abstract features?
This explores the difference between features that *encode what something is* (abstract representations of concepts) and features that *carry out operations on them* (the functional machinery a model uses), and how the corpus draws that line.
This explores the difference between features that store *what a concept is* and features that perform *what the model does with it* — the gap between representation and computation. The cleanest map comes from circuit tracing in Claude, which finds a four-tier hierarchy: token-level inputs, then abstract concepts, then *functional* operations, then outputs How do language models organize features across processing layers?. Abstract features are the model's concepts — its internal idea of "capital city" or "plural noun." Functional features sit one tier up and act *on* those concepts: they're the verbs, not the nouns. The same work notes that bigger models grow richer abstract features, which suggests scaling buys higher-level conceptual vocabulary rather than just more memorized patterns.
The sharpest reason to keep the two separate is that representation and computation can come fully apart. One striking result shows networks can compute perfectly well with *no* interpretable activation structure at all — homomorphic encryption lets a model run the right function over scrambled internals, proving the pattern you can read off the activations and the operation actually being performed are decoupled Do standard analysis methods hide nonlinear features in neural networks?. So a "representational" feature is something an analyst can decode; a "functional" feature is something that, if you ablated it, would break a specific computation — and those need not be the same thing.
That decoupling is exactly why looking only at representations can mislead. A model can hold all the linearly-decodable features a task needs while its internal organization is fractured and brittle — the representation looks complete, but the function it supports collapses under perturbation Can models be smart without organized internal structure?. The flip side shows up when you go hunting for the functional machinery directly: pruning experiments reveal that neural nets quietly split compositional tasks into isolated subnetworks, where knocking out one module disables only its corresponding operation Do neural networks naturally learn modular compositional structure?. Those subnetworks are functional features in the most literal sense — physically separable operations — and pretraining makes them more reliably modular.
There's a lateral wrinkle worth noticing: abstract representations are often *geometric*, while functional behavior is *structural*. LLMs encode syntactic relations in polar coordinates — type by angle, direction by distance — a spontaneously learned, symbol-compatible geometry that is pure representation How do language models encode syntactic relations geometrically?. But meaning-features are entangled: intervene on one semantic axis and aligned ones shift proportionally, so the representation isn't a clean set of independent dials Do LLM semantic features organize along human evaluation dimensions?. Functional features, by contrast, behave more like operations you can isolate and ablate. The binding problem frames why this matters: networks struggle to *dynamically* bind distributed representations into new compositional structures — a failure that is functional, not representational, since the concepts are present but the machinery to recombine them on the fly is weak Why do neural networks fail at compositional generalization?.
So the difference, across the corpus, is less a taxonomy than a warning: a feature you can *read* (representation) and a feature that *does work* (function) live at different tiers, can be physically separated by pruning, and can drift entirely apart — which is why a model can look well-organized and still fail, or look scrambled and still compute.
Sources 7 notes
Circuit tracing in Claude models reveals features progress from token-level inputs to abstract concepts to functional operations to outputs. Larger models develop richer abstract features, suggesting scaling enables higher-level conceptual reasoning rather than pattern memorization.
PCA, linear regression, and RSA over-represent simple linear features while under-representing equally important nonlinear features. Homomorphic encryption demonstrates that networks can compute perfectly well with no interpretable activation structure, proving representation patterns and computation can be entirely decoupled.
Models trained with SGD can contain all the linearly decodable features needed for a task while maintaining fundamentally broken internal organization. This makes them vulnerable to perturbation and distribution shift invisible to standard evaluation metrics.
Pruning experiments reveal that neural networks implement compositional subroutines in isolated subnetworks, with ablations affecting only their corresponding function. Pretraining substantially increases the consistency and reliability of this modular structure across architectures and domains.
The Polar Probe shows LLMs represent syntactic type and direction through both distance and angular position between embeddings, nearly doubling accuracy over distance-only methods. This demonstrates neural networks spontaneously learn structured, symbolic-compatible geometry.
Twenty-eight semantic axes in LLM embeddings reduce to three principal components matching human EPA structure. Intervening on one feature predictably shifts aligned features proportionally, creating unavoidable off-target effects that reflect how meaning is fundamentally organized.
Greff et al. argue that neural networks cannot dynamically bind distributed information into compositional structures due to three failures: segregating entities from inputs, maintaining representational separation, and reusing learned structure in novel combinations. Scaling can partially overcome this by enabling compositional representations to emerge.