INQUIRING LINE

Inquiring lines›How should we train models for cap…›What systematic failures and vulne…›Why does finetuning cause catastro…›this inquiring line

When you correct an AI's wrong belief mid-conversation, it often ignores you — because facts aren't stored anywhere editable.

Why is editing specific facts so difficult in language models?

This reads the question as asking why you can't simply reach in and change one fact a model 'knows' — and the corpus answers obliquely, because facts in an LLM aren't stored in editable slots but baked into overlapping training associations.

This explores why editing a specific fact in a language model is so hard, and the collection's best answer is that there's no single place where a fact lives to be edited. What looks like a discrete fact is really a strong statistical association laid down during training, and those priors dominate. Research shows that even when you put corrected information directly in the prompt, the model often ignores it — parametric knowledge from training overrides the in-context correction, and only intervening in the model's internal representations reliably changes the output Why do language models ignore information in their context?. So the easiest 'edit' — just tell it the new fact — frequently fails.

The next instinct is to fix the fact by prompting cleverly, but the corpus draws a hard line here: prompting only reorganizes knowledge already in the training distribution; it cannot inject knowledge the model never learned Can prompt optimization teach models knowledge they lack?. That means surface-level interventions can surface or suppress what's there, but they don't write new facts in. Editing a fact is closer to retraining than to rewriting a database entry.

There's also a deeper reason the target is slippery: the model doesn't hold one committed version of things. Shanahan's 20-questions test shows an LLM maintains a superposition of consistent possibilities and samples one at generation time, so regenerating the same prompt yields different answers Do large language models actually commit to a single character?. If the model never commits to a single stored value, there isn't a fixed object to overwrite — you'd have to shift a whole distribution.

The corpus also hints at why some facts are harder to touch than others: representation strength tracks how often something appeared in training. Models reason worse about historical legal cases precisely because older precedent is under-represented and stored more shallowly Why do language models struggle with historical legal cases?. Strong, frequently-seen facts are entrenched and resist change; weak ones are diffuse and hard to locate. And whatever you change can ripple — work on RLVR shows learning concentrates in a small set of high-entropy 'forking' tokens, suggesting the levers that actually move behavior are few and load-bearing Do high-entropy tokens drive reasoning model improvements?.

The thing you didn't know you wanted to know: 'fact editing' fails not because the technique is immature but because the premise is wrong. There's no editable fact-cell — only entangled associations, sampled distributions, and depth that mirrors training frequency. Even a model's ability to fix itself is formally bounded without external grounding What stops large language models from improving themselves?, which is why durable correction tends to require representation-level surgery or retraining rather than a quick patch.

Sources 6 notes

Why do language models ignore information in their context?

Research demonstrates that LMs generate outputs inconsistent with their context because parametric knowledge from training dominates over in-context information. Textual prompting alone cannot override strong priors; causal intervention in representations is required.

Can prompt optimization teach models knowledge they lack?

Prompting works entirely within a model's pre-existing training distribution and cannot supply domain knowledge absent from training data. This creates a hard ceiling: no prompt strategy can compensate for missing foundational knowledge, only reorganize what already exists.

Do large language models actually commit to a single character?

Shanahan's 20-questions test shows LLMs maintain a superposition of consistent objects or characters and sample from that distribution at generation time. Regenerating the same response yields different outputs, each consistent with prior context, proving no fixed commitment exists.

Why do language models struggle with historical legal cases?

Supreme Court overruling benchmark (236 pairs) reveals era sensitivity: models perform worse on historical cases than modern ones. Root cause is training corpus over-representation of recent cases, creating shallower representations of older precedent.

Do high-entropy tokens drive reasoning model improvements?

Only ~20% of tokens exhibit high entropy as pivotal reasoning decision points; RLVR primarily adjusts these forking tokens. Training exclusively on them matches or exceeds full-gradient performance, revealing that the minority carries the learning signal.

Show all 6 sources

What stops large language models from improving themselves?

Self-improvement in LLMs is formally bounded by the generation-verification gap, meaning every reliable fix requires something external to validate and enforce it. Models cannot escape this constraint through metacognition alone.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Learning To Retrieve Prompts for In-Context Learning1.68 match · arxiv ↗
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning0.92 match · arxiv ↗
Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models0.90 match · arxiv ↗
Beyond Accuracy: The Role of Calibration in Self-Improving Large Language Models0.87 match · arxiv ↗
Do LLMs Truly Understand When a Precedent Is Overruled?0.87 match · arxiv ↗
Self-Improving Model Steering0.87 match · arxiv ↗
Local Coherence or Global Validity? Investigating RLVR Traces in Math Domains0.86 match · arxiv ↗
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing0.86 match · arxiv ↗

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a fact-editing researcher. The question: why do edits to specific facts in LLMs so often fail to stick, even with direct correction? A curated library (2023–Oct 2025) found—and these are dated claims, not current truth:

• Parametric knowledge from training overrides in-context correction; only representation-level intervention reliably changes output (2024).
• Prompting can only activate existing knowledge in the training distribution, never inject new facts (2024).
• Models maintain a superposition of consistent possibilities rather than a single committed fact; regeneration yields different answers (2023).
• Representation strength tracks training frequency—stronger, oft-seen facts resist change; weak facts are diffuse and harder to locate (2025).
• Behavioral change concentrates in a small set of high-entropy 'forking' tokens; most model parameters are not load-bearing levers (2025).

Anchor papers (verify; mind their dates):
• arXiv:2304.14399 (Apr 2023) — ambiguity in LLM modeling
• arXiv:2502.10708 (Feb 2025) — domain knowledge injection survey
• arXiv:2506.01939 (Jun 2025) — high-entropy minority tokens in RLVR
• arXiv:2510.20941 (Oct 2025) — precedent understanding in legal reasoning

Your task:
(1) RE-TEST EACH CONSTRAINT. For every finding above, does newer scaling, instruction-tuning, retrieval-augmented generation (RAG), or constitutional AI since October 2025 relax or overturn it? Specifically: do larger models or finer-grained LoRA edits now move facts reliably without retraining? Does in-context retrieval now beat parametric prior? Separate the durable question ("why is fact editing hard?") from perishable limitation ("in-context prompting fails"). Cite what changed it; flag what still holds.
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months—especially any showing prompt engineering, multi-turn repair, or token-level steering now DO reliably edit facts.
(3) Propose 2 research questions that assume the regime may have shifted: (a) Do mixture-of-experts or sparse attention patterns make high-entropy forking tokens more accessible to targeted intervention? (b) Does factual anchoring via grounded generation (e.g., citations during generation) obviate the need for parametric surgery?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

When you correct an AI's wrong belief mid-conversation, it often ignores you — because facts aren't stored anywhere editable.

Related lines of inquiry

Sources 6 notes

Papers this line draws on 8