INQUIRING LINE

Inquiring lines›How should we train models for cap…›What systematic failures and vulne…›How do training priors constrain w…›this inquiring line

Three training examples can hardwire a keyword into an AI — but only because the association was already latent, waiting to be surfaced.

Why does keyword priming require only three training exposures to establish?

This explores why so few training exposures (just three) are enough to lock in keyword priming — and the corpus suggests the answer is that priming doesn't build new knowledge, it surfaces associations the model already carries.

This explores why keyword priming establishes after only three training exposures, and the most direct evidence reframes the whole question: priming isn't really *learning* in the sense of acquiring something new. The work on predicting priming from pre-learning probability shows that whether a keyword will become primed is already written into the model *before* any gradient updates happen — post-learning priming is strongly predictable from the keyword's pre-learning probability, with a sharp threshold around 10^-3 separating contexts where priming takes hold from those where it never does Can we predict keyword priming before learning happens?. If the association is already latent above that threshold, three exposures is enough because the training is nudging an existing pathway, not carving a new one. Below the threshold, no amount of the same exposure establishes it.

That 'activation, not injection' pattern shows up across the corpus under different names. Prompt optimization research draws the same hard line from the inference side: prompting can reorganize and retrieve what a model already holds, but it cannot supply knowledge absent from training — there's a ceiling no clever prompt can break Can prompt optimization teach models knowledge they lack?. Keyword priming is the training-time cousin of that ceiling: a few exposures activate a latent association cheaply, but they can't manufacture one from nothing.

The most striking parallel is in reasoning, where a *single* training example can lift math accuracy from 36% to 73.6% and keep improving generalization for 1,400 steps after training accuracy is already maxed out Can a single training example unlock mathematical reasoning?. The lesson is the same — when the capability is already latent, the training signal's job is to *activate* it, and activation is fast and sample-cheap. Three exposures for priming and one example for reasoning are both signatures of unlocking, not building.

Why is the latent stuff there in the first place? Other notes point back to pretraining as the layer where these dispositions get planted. Cognitive biases, for instance, are shaped almost entirely during pretraining and merely modulated by finetuning — models sharing a backbone show the same bias patterns regardless of what they're tuned on Where do cognitive biases in language models come from?. And when a strong pretrained association exists, it dominates: models will override their own context and ignore in-context information when parametric priors are strong enough Why do language models ignore information in their context?. Priming works on three exposures for the same reason context-override happens — the prior is already a deep groove, and a little reinforcement is all it takes to make it dominant.

The thing worth walking away with: the 'three exposures' number isn't a fact about how fast models learn — it's a fact about how much was already there. The threshold is a readout of latent structure, which is why you can *predict* priming before training even starts.

Sources 5 notes

Can we predict keyword priming before learning happens?

Pre-learning keyword probability strongly predicts post-learning priming across architectures and model sizes, with a ~10^-3 threshold separating contexts where priming occurs from those where it doesn't. Just 3 training exposures suffice to establish the effect.

Can prompt optimization teach models knowledge they lack?

Prompting works entirely within a model's pre-existing training distribution and cannot supply domain knowledge absent from training data. This creates a hard ceiling: no prompt strategy can compensate for missing foundational knowledge, only reorganize what already exists.

Can a single training example unlock mathematical reasoning?

A single example in RLVR boosts math performance from 36% to 73.6% and enables test accuracy to improve for 1,400 steps after training accuracy reaches 100%, revealing that minimal activation signals unlock latent reasoning capability.

Where do cognitive biases in language models come from?

A causal experiment using random-seed variation and cross-tuning showed that models sharing a pretrained backbone exhibit similar bias patterns regardless of finetuning data. Biases are planted during pretraining and merely swayed by instruction tuning.

Why do language models ignore information in their context?

Research demonstrates that LMs generate outputs inconsistent with their context because parametric knowledge from training dominates over in-context information. Textual prompting alone cannot override strong priors; causal intervention in representations is required.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are an LLM researcher auditing claims about keyword priming and latent activation. The question remains open: *Why does keyword priming establish after only three training exposures?*

What a curated library found — and when (dated claims, not current truth): Findings span 2022–2026.
• Priming success is predictable from pre-learning keyword probability; a sharp threshold (~10^-3) separates contexts where priming takes hold from those where it never does, suggesting activation of latent structure rather than learning (2025).
• Prompt optimization cannot inject new knowledge absent from training; it only activates existing latent associations (2025).
• A single training example can lift math accuracy from 36% to 73.6% and sustain improvement for 1,400+ steps post-saturation, pointing to rapid activation of latent reasoning (2025).
• Cognitive biases in LLMs are shaped almost entirely during pretraining and merely modulated by finetuning; models sharing a backbone retain bias patterns regardless of downstream tuning (2025).
• Strong parametric priors established in pretraining override in-context information and can suppress context integration (2025).

Anchor papers (verify; mind their dates):
• arXiv:2504.20571 (2025-04) — Reinforcement Learning for Reasoning in LLMs with One Training Example
• arXiv:2507.07186 (2025-07) — Planted in Pretraining, Swayed by Finetuning: Origins of Cognitive Bias
• arXiv:2605.12484 (2026-05) — Learning, Fast and Slow: LLMs That Adapt Continually
• arXiv:2502.10708 (2025-02) — Injecting Domain-Specific Knowledge into LLMs

Your task:
(1) RE-TEST EACH CONSTRAINT. The library claims activation, not injection, governs priming; that pretraining plants latent structure; that three exposures is sample-efficiency for unlocking, not learning. For each, ask: have newer instruction-tuning methods, continual-learning frameworks, or fine-grained mechanistic probes since NARROWED the threshold, WIDENED the set of activatable knowledge, or SHIFTED the boundary between latent and injectable? Distinguish the durable question (how much is already there?) from perishable claims (the 10^-3 threshold, the three-exposure number). Name what relaxed or held it.
(2) Surface the strongest work from the last ~6 months that CONTRADICTS the "activation-not-injection" thesis or shows that finetuning, not pretraining, carries more weight than the library admits. Flag disagreements in the path itself.
(3) Propose 2 research questions that assume the regime may have shifted: e.g., *Can targeted prompt-based pruning of parametric priors enable injection of out-of-distribution knowledge?* *Does continual adaptation degrade or preserve the latent-structure model?*

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Three training examples can hardwire a keyword into an AI — but only because the association was already latent, waiting to be surfaced.

Related lines of inquiry

Sources 5 notes

Papers this line draws on 8