SYNTHESIS NOTE
Model Architecture and Internals Reasoning, Retrieval, and Evaluation

Can identical outputs hide broken internal representations?

Can neural networks produce correct outputs while having fundamentally fractured internal structure that prevents generalization and creativity? This challenges our assumptions about what performance benchmarks actually measure.

Synthesis note · 2026-02-23 · sourced from MechInterp
What kind of thing is an LLM really? How should researchers navigate LLM reasoning research?

The FER hypothesis (Fractured Entangled Representation) poses a fundamental challenge to representational optimism — the implicit belief that as models scale and perform better, their internal representations must also be improving.

The experimental setup is elegantly simple: compare a CPPN evolved through open-ended search (Picbreeder) with an SGD-trained CPPN that reproduces the same output pixel-for-pixel. The outputs are identical. The internal representations are radically different. The evolved network explicitly represents the symmetry of a skull — perturbing weights produces coherent variations (winking, warping) that respect the underlying structure. The SGD-trained network shatters symmetry under the slightest perturbation, producing incoherent fragments that reveal no understanding of what it draws.

This is "imposter intelligence": the external appearance implies authentic internal representation, but the reality underneath is fractured across arbitrary subdomains and entangled across unrelated computations.

Three consequences for large models:

  1. Generalization in data-sparse regions. FER means the model cannot apply general principles from well-covered regions to sparse borderlands — precisely where AI could make its most valuable contributions. The principles are fractured, so they only apply to narrow arbitrary subdomains.

  2. Creativity. Creating something new requires understanding the regularities of what exists. If those regularities are represented fracturely — counting bricks uses different circuits than counting apples — the model cannot extend or recombine concepts coherently.

  3. Continual learning. Learning is movement through weight space. If nearby points in weight space break regularities rather than respect them, learning cannot build on deep discoveries. This compounds in continual learning scenarios.

The challenge: standard benchmarks, including comprehensive behavioral evaluations, cannot distinguish FER from genuine representation. The imposter skull produces correct output for every possible input. Only weight perturbation analysis — probing the neighborhood of the solution, not the solution itself — reveals the pathology.

This reframes what it means for a model to "understand" something: Can LLMs understand concepts they cannot apply? describes the behavioral symptom. FER describes the mechanistic cause — the internal representation is fractured in ways that prevent the understanding from transferring to novel contexts.

Inquiring lines that use this note as a source 39

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 6

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
23 direct connections · 172 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

fractured entangled representations mean identical performance can mask fundamentally broken internal structure