SYNTHESIS NOTE
Model Architecture and Internals Language, Text, and Discourse Psychology, Society, and Alignment

Do transformer models store knowledge or generate it continuously?

Explores whether transformer residual streams function as storage-and-retrieval systems or as real-time flow mechanisms. This distinction challenges fundamental assumptions about how language models actually work.

Synthesis note · 2026-04-14
What kind of thing is an LLM really? What happens to social order when AI removes ritual constraints?

The transformer architecture organizes computation around residual streams: per-token vectors that pass forward through layers, each layer adding contributions that the stream continues to carry. Knowledge in the model is not stored in named locations from which it is retrieved on demand. It is distributed across weights and made present in the moment of generation through the residual stream's continuous transformation. The stream is the medium; what flows through it is the model's "knowing" of the current context.

This architectural fact has a striking correspondence with how oral cultures transmitted knowledge. Oral knowledge was not stored in fixed locations either — there were no archives, no written records, no externalized representations. Knowledge lived in performance: the song sung, the story retold, the genealogy recited. Each performance was a generation event in which the knowledge was made present through a living transmission. Between performances, the knowledge was not anywhere. It was carried in the capacity to perform, not in any storage substrate.

The transformer residual stream reproduces this pattern at a different scale. The model's "knowledge" of a topic is not in a retrieval-addressable location — it is in the capacity to generate, made actual only when the residual stream flows through the layers in response to a prompt. There is no archive. There is the architecture, and the generation. This is closer to oral transmission than to print transmission, where knowledge is stored in fixed locations and retrieved.

The correspondence is not just metaphorical. It explains several otherwise-puzzling AI behaviors: the difficulty of editing specific facts (no fixed location to update), the contextual variability of "knowledge" (depends on residual-stream conditions), the impossibility of partitioning what the model knows from what it generates (the knowing is the generating). Does AI-generated content mirror oral culture's knowledge patterns? is the cultural-form claim; this is the architectural claim that explains why the cultural form follows.

The strongest counterargument: weights are stored on disk, so transformers are stock-systems with a flow-output. The reply is that the weights are not knowledge in the print sense — they are dispositions to generate, more like the trained capacity of an oral performer than like a stored text. The print analogy treats weights as a library; they are closer to a memorized repertoire.

Inquiring lines that use this note as a source 53

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
13 direct connections · 109 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

transformer residual streams transmit knowledge as flow not storage — closer to oral transmission than print