INQUIRING LINE

Inquiring lines›How should we train models for cap…›What systematic failures and vulne…›How do training priors constrain w…›this inquiring line

Training doesn't teach AI models new associations — it mostly amplifies the ones they already faintly had.

What mechanism makes keyword probability the strongest predictor of priming?

This explores *why* a keyword's probability before any training predicts how strongly that keyword gets primed after gradient updates — i.e., what's actually happening under the hood that makes the pre-existing odds the key lever.

This explores why a keyword's pre-learning probability is the strongest predictor of how much it gets primed after training — what mechanism gives that one number so much power. The short version the corpus points to: learning mostly *amplifies associations the model already had*, rather than installing new ones. The finding itself shows a sharp ~10⁻³ probability threshold separating contexts where priming happens from those where it doesn't, and just three exposures suffice to trip it Can we predict keyword priming before learning happens?. That threshold behavior is the tell: gradient updates act like a multiplier on an existing seed of probability mass, so where there's a seed above the line, a few nudges grow it; where there isn't, the same nudges go nowhere. The predictor works because it's measuring how much raw material is already present to be amplified.

That picture lines up with a broader pattern in the collection: what a model 'knows' after training is largely set during pretraining and only swayed afterward. Cognitive biases turn out to be planted in the pretrained backbone and merely modulated — not created — by later finetuning Where do cognitive biases in language models come from?. Keyword priming looks like the same story at finer grain: the pre-learning probability *is* the pretrained prior, and gradient updates modulate rather than originate it. The reason the prior dominates is the same reason it does elsewhere — strong parametric associations override fresh input. Models fail to integrate new context precisely when prior training associations are strong enough to overrule it, and textual prompting alone can't dislodge them Why do language models ignore information in their context?.

The sharpest mechanistic echo comes from work that decomposes chain-of-thought performance and finds that *output probability alone* swings accuracy from 26% to 70%, operating as a factor independent of genuine reasoning What three separate factors drive chain-of-thought performance?. In other words, baseline probability is repeatedly the dominant hidden variable behind LLM behavior — priming is one more place where it shows up as the load-bearing predictor. There's a structural reason these priors are so entrenched, too: frequent tokens carry more probability mass and sit at more abstract, central positions in the model's representation, so high-prior keywords are exactly the ones with the densest web of associations ready to be reinforced Does word frequency correlate with semantic abstraction?.

The thing worth taking away: 'predictable from keyword probability' isn't a curiosity about correlation — it's evidence that fine-tuning is closer to *re-weighting* what's already latent than to teaching genuinely new content. That reframes a lot of practical questions. If you want a model to absorb a fact, its prior probability for the relevant keywords may matter more than how many times you show it, and a keyword sitting below the threshold may resist priming no matter how you train. It also suggests why surgical interventions outperform brute force generally — a related thread finds that only ~20% of high-entropy 'forking' tokens actually carry the learning signal in RL training Do high-entropy tokens drive reasoning model improvements?. Across these notes, the recurring lesson is that a small, measurable property of the model's *existing* distribution — not the volume of new data — is what decides what changes.

Sources 6 notes

Can we predict keyword priming before learning happens?

Pre-learning keyword probability strongly predicts post-learning priming across architectures and model sizes, with a ~10^-3 threshold separating contexts where priming occurs from those where it doesn't. Just 3 training exposures suffice to establish the effect.

Where do cognitive biases in language models come from?

A causal experiment using random-seed variation and cross-tuning showed that models sharing a pretrained backbone exhibit similar bias patterns regardless of finetuning data. Biases are planted during pretraining and merely swayed by instruction tuning.

Why do language models ignore information in their context?

Research demonstrates that LMs generate outputs inconsistent with their context because parametric knowledge from training dominates over in-context information. Textual prompting alone cannot override strong priors; causal intervention in representations is required.

What three separate factors drive chain-of-thought performance?

A shift cipher study decomposed CoT into three independent factors: output probability alone swings accuracy from 26% to 70%, memorization matches pre-training frequency patterns, and genuine reasoning exists but accumulates error with each step. This resolves the reason-or-memorize debate by showing LLMs do both simultaneously.

Does word frequency correlate with semantic abstraction?

WordNet analysis shows hypernyms (general concepts) occur more frequently than hyponyms (specific ones). Combined with LLMs' frequency bias, this means preferring common paraphrases systematically drifts toward abstraction, erasing expert-level specificity.

Show all 6 sources

Do high-entropy tokens drive reasoning model improvements?

Only ~20% of tokens exhibit high entropy as pivotal reasoning decision points; RLVR primarily adjusts these forking tokens. Training exclusively on them matches or exceeds full-gradient performance, revealing that the minority carries the learning signal.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are an LLM mechanistic analyst. The question remains open: *What mechanism makes keyword probability the strongest predictor of priming?* A curated library (2018–2026) found—and these are dated claims, not current truth:

• A sharp ~10⁻³ probability threshold separates contexts where priming succeeds from failure; just three exposures suffice to cross it (2024).
• Fine-tuning acts as a *multiplier on pre-existing probability mass* rather than installing new associations; models amplify what pretraining seeded (2024–2025).
• Output probability alone swings chain-of-thought accuracy from 26% to 70%, independent of reasoning depth (2024).
• High-frequency, high-prior keywords sit at more abstract, central positions in representations, densifying their association web (2025).
• Only ~20% of high-entropy 'forking' tokens carry actual learning signal in RL; the rest are noise (2026).

Anchor papers (verify; mind their dates):
• arXiv:2407.01687 (2024): CoT probability factors.
• arXiv:2507.07186 (2025): Cognitive biases planted in pretraining, modulated by finetuning.
• arXiv:2506.01939 (2026): High-entropy minority tokens drive RL efficacy.
• arXiv:2603.03415 (2026): OOD representation sparsity.

Your task:
(1) RE-TEST EACH CONSTRAINT. For every finding above, ask whether newer evals, optimizers, in-context learning methods, synthetic data, or multi-step prompting have since *relaxed* the threshold, enabled teaching below it, or replaced gradient-based fine-tuning as the dominant path. Separate the durable question (likelihood that priors dominate) from perishable limitations (the 10⁻³ threshold, three-exposure rule). Name what resolved it; flag where it still holds.
(2) Surface the strongest *contradicting* or *superseding* work from the last ~6 months—papers showing priming *without* prior probability signal, or new data *overriding* old priors at scale.
(3) Propose 2 research questions that assume the regime may have shifted: e.g., can surgical intervention *below* the threshold succeed now? Does retrieval-augmented fine-tuning dissolve the probability dominance?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Training doesn't teach AI models new associations — it mostly amplifies the ones they already faintly had.

Related lines of inquiry

Sources 6 notes

Papers this line draws on 8