INQUIRING LINE

Inquiring lines›How do language models construct a…›How does AI persuasion undermine h…›Why do continual learning scenario…›this inquiring line

What looks like genuine AI understanding might just be a tally of how often the model saw something during training.

How does representational density emerge from training data familiarity?

This explores why neural networks build dense, rich internal representations for data they've seen often during training — and what that says about the difference between genuine learning and frequency-driven familiarity. The core finding is almost behavioral: during pretraining, a model fires dense activations for inputs it recognizes from its training diet and falls back to sparse, thin representations for anything unfamiliar — and this happens on its own, without any task-specific tuning Is representational sparsity learned or intrinsic to neural networks?. Density, in other words, isn't a fixed property of the architecture. It's a residue of exposure.

That reframes a lot of what looks like 'understanding' as something closer to frequency bookkeeping. When researchers traced multimodal models' supposed zero-shot generalization, they found performance tracks how often a concept actually appeared in pretraining — models need exponentially more data for linear gains, which means the impressive results are interpolation over familiar territory, not leaps into the new Does multimodal zero-shot performance actually generalize or interpolate?. The same logic shows up at the micro scale: whether a keyword gets 'primed' after a gradient update is predictable from its probability *before* learning, with a sharp threshold and as few as three exposures needed to lock the effect in Can we predict keyword priming before learning happens?. Familiarity isn't a vague feeling the model has — it's a measurable quantity that forecasts how it will represent and recall things.

But familiarity isn't one thing, and this is where the corpus gets interesting. There's a real split between facts and procedures. Factual recall depends on narrow, document-specific memorization — the model essentially needs to have seen *that fact* in *that document*. Reasoning, by contrast, draws on broad procedural knowledge distributed across many diverse sources, which is why it generalizes where memorized facts don't Does procedural knowledge drive reasoning more than factual retrieval?. So 'density from familiarity' has two flavors: a brittle, lookup-style density for memorized facts, and a more transferable density built from repeated exposure to *patterns of doing* rather than *items to retrieve*.

There's also a structural side to how this density organizes itself. Networks don't just accumulate a dense blur — pretraining sharpens compositional structure, carving tasks into modular subnetworks where ablating one piece only breaks its corresponding function, and this modularity gets more reliable the more pretraining a model has had Do neural networks naturally learn modular compositional structure?. Familiarity, then, doesn't only thicken representations; it also consolidates them into reusable parts. A complementary line of work suggests this happens far faster when learning targets latent structure rather than raw tokens, because same-level latents are far more correlated than surface tokens — so the model recovers hierarchy with samples that don't blow up with depth Why is predicting latents more sample-efficient than tokens?.

The quietly subversive payoff: if density is learned through exposure and base models already carry the structure, then much of what post-training 'adds' may just be elicitation of what familiarity already built. Several independent methods — RL steering, critique tuning, decoding tweaks, feature steering — all surface reasoning that was already latent in base activations, suggesting post-training selects rather than creates Do base models already contain hidden reasoning ability?. That also explains why heavy fine-tuning can backfire: directly rewriting weights corrupts the knowledge stored in lower layers, while decoding-time approaches that leave those familiar representations untouched preserve knowledge far better Can decoding-time tuning preserve knowledge better than weight fine-tuning?. The density built by familiarity is valuable precisely because it's fragile — worth eliciting, dangerous to overwrite.

Sources 8 notes

Is representational sparsity learned or intrinsic to neural networks?

During pretraining, neural networks develop dense activations for familiar training data and default to sparse representations for unfamiliar inputs. This trend emerges without task-specific fine-tuning and reflects how models consolidate knowledge through exposure.

Does multimodal zero-shot performance actually generalize or interpolate?

Across 34 models and 5 datasets, multimodal models require exponentially more pretraining data for linear performance gains on downstream tasks. Performance correlates with how often test concepts appeared during pretraining, not genuine generalization ability.

Can we predict keyword priming before learning happens?

Pre-learning keyword probability strongly predicts post-learning priming across architectures and model sizes, with a ~10^-3 threshold separating contexts where priming occurs from those where it doesn't. Just 3 training exposures suffice to establish the effect.

Does procedural knowledge drive reasoning more than factual retrieval?

Analysis of 5 million pretraining documents shows reasoning relies on broad, transferable procedural knowledge from diverse sources, unlike factual recall which depends on narrow, document-specific memorization of target facts.

Do neural networks naturally learn modular compositional structure?

Pruning experiments reveal that neural networks implement compositional subroutines in isolated subnetworks, with ablations affecting only their corresponding function. Pretraining substantially increases the consistency and reliability of this modular structure across architectures and domains.

Show all 8 sources

Why is predicting latents more sample-efficient than tokens?

A formal sample-complexity analysis proves latent-level self-supervision (data2vec/JEPA style) recovers compositional structure with samples constant in hierarchy depth, while token-level learning requires exponential samples—because same-level latents are far more correlated than raw tokens.

Do base models already contain hidden reasoning ability?

Five independent mechanisms—RL steering, critique fine-tuning, decoding changes, SAE feature steering, and RLVR—all elicit reasoning already present in base model activations. Post-training selects rather than creates reasoning; the bottleneck is elicitation, not capability acquisition.

Can decoding-time tuning preserve knowledge better than weight fine-tuning?

Proxy-tuning closes 88-91% of the alignment gap while surpassing direct fine-tuning on knowledge tasks by leaving base model weights untouched. Direct fine-tuning corrupts knowledge storage in lower layers, whereas proxy-tuning applies distributional shifts that primarily affect reasoning and style.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-testing claims about how representational density in LLMs emerges from training data familiarity. The question remains: Does exposure to frequent data *create* dense, reusable representations, or merely *surface* structure that pre-exists in the architecture?

What a curated library found — and when (dated claims, not current truth):
Findings span 2023–2026; treat as perishable constraints:
- Dense activations emerge selectively for familiar training inputs; sparse representations emerge for OOD data, requiring exponentially more pretraining data for linear gains in zero-shot generalization (~2024).
- Factual recall depends on narrow memorization of specific documents; procedural knowledge generalizes broadly across diverse sources, driving reasoning (~2025).
- Compositional modularity (ablation-resistant subnetworks) sharpens reliably with more pretraining; learning latent structure is exponentially more sample-efficient than token-level learning (~2026).
- Base models already contain latent reasoning capabilities; post-training elicits rather than creates, while weight-rewriting corrupts lower-layer representations (~2025).
- OOD shift intensity predicts sparsity; weight-sparse transformers exhibit interpretable circuits (~2026).

Anchor papers (verify; mind their dates):
- arXiv:2404.04125 (2024-04) — multimodal zero-shot and pretraining concept frequency
- arXiv:2411.12580 (2024-11) — procedural vs. factual knowledge split
- arXiv:2605.27734 (2026-05) — latent-space learning sample complexity
- arXiv:2603.03415 (2026-03) — OOD sparsity mechanisms

Your task:
(1) RE-TEST EACH CONSTRAINT. For models trained post-2026Q1 (e.g., o1, o3, new multimodal variants), has the exponential data scaling curve for zero-shot flattened? Do procedural vs. factual splits still hold under instruction tuning at scale? Can decoding-time steering now *create* (not just elicit) novel reasoning? Separate the durable question—does frequency drive density?—from perishable limits tied to specific architectures or datasets. Cite what has relaxed each constraint.
(2) Surface the strongest contradicting or superseding work from the last ~6 months—especially any showing density *without* high pretraining frequency, or emergent structure *independent* of data exposure.
(3) Propose 2 research questions assuming the regime may have shifted: (a) Can density be induced *inversely*—built for novel, low-frequency concepts through synthetic diversity? (b) Does continual adaptation (fast + slow learning) rebuild density for drifting, unfamiliar data, or does it corrupt the pretraining residue?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

What looks like genuine AI understanding might just be a tally of how often the model saw something during training.

Related lines of inquiry

Sources 8 notes

Papers this line draws on 8