SYNTHESIS NOTE

Topics›Discourses›this note

Why do language models ignore information in their context?

Explores why language models sometimes override contextual information with prior training associations, and whether providing more context can solve this problem.

Synthesis note · 2026-02-21 · sourced from Discourses

The REMEDI paper names a specific failure mode: "failure of context integration." The example: an LM is prompted with a context establishing that Anita works in a law office, but when generating a continuation, the LM describes Anita as a nurse — overriding the contextual information with a prior association (names like Anita may statistically co-occur with certain occupations in training data).

This is a named, empirically documented failure mode, not a hypothetical. The failure occurs because the LM's parametric knowledge (compressed into weights from training) and its in-context information (the prompt) are not cleanly integrated. When they conflict, the parametric association can win.

The implication is important for how we think about context windows and RAG-style augmentation. Just providing information in context does not guarantee that a model will use it. If the information conflicts with strong prior associations, the prior may dominate — not because the model misread the context, but because context integration is not a lossless operation. The provided information gets processed through the same mechanisms that already have strong priors.

Fixing this requires causal intervention, not just better prompting: you need to modify the representations that carry the prior association, not just add more context on top of them. This is what REMEDI demonstrates — that adding a learned vector directly to entity representations can override the prior in a way that textual prompting cannot.

Inquiring lines that read this note 317

This note is a source for these research framings, grouped by the broader line of inquiry each explores. Scan the bold lines of inquiry; follow any specific question forward.

How should dialogue systems represent uncertainty from noisy speech input?

Do language models learn genuine linguistic structure or just surface patterns?

How does AI-generated content transformation affect public discourse quality?

How does AI lose correct information under conversational persuasive pressure?

How can LLM recommenders match or exceed collaborative filtering performance?

Why do naive baselines outperform trained models in entity-level CRS evaluation?

How faithfully do LLMs reflect their actual reasoning in outputs and explanations?

Can LLMs propose pivots that change what counts as background context?

How do training priors constrain what context information can override?

How effectively do deterministic tools improve language model reasoning on formal tasks?

What scaffolding tools help users specify implicit contextual boundaries to models?

Why do language models reinforce false assumptions instead of correcting them?

Why do multi-turn conversations degrade AI intent and coherence?

Why does context collapse pose risks in high-stakes conversations?

How can identical external performance mask different internal representations?

How do unstated constraints become invisible to training data distributions?

Is embodied interaction necessary for language meaning and genuine agency?

Do language models understand semantics or rely on pattern matching?

Why do language models struggle with implicit discourse relations?

Can prompting inject entirely new knowledge into language models?

Can prompting strategies overcome LLM biases without model fine-tuning?

What structural factors drive popularity bias in recommendation systems?

How do position bias and popularity bias interact with sequence order blindness?

What mechanisms drive sycophancy and how can we mitigate it?

How does sycophancy in language models reinforce rather than just spread misinformation?

Does alignment training create blind spots in detecting genuine safety threats?

What determines success in training models on multiple tasks?

Does task superposition explain how models learn from multiple in-context trajectories?

What properties determine whether reward signals teach genuine reasoning?

How does example difficulty affect learning efficiency in language models?

How can language models sustain linguistic synchrony and intersubjectivity during dialogue?

Why do conversational pivots require explicit re-prompting instead of natural evolution?

What articulatory information do speech signals carry that text cannot?

What role does compression play in language model capability and generalization?

How can models identify insufficient information and respond appropriately without guessing?

How do language models inherit human biases from training data?

Why do semantic similarity and task relevance diverge in vector embeddings?

What structural advantages do diffusion language models offer over autoregressive methods?

How do language models establish social grounding in human dialogue?

Can next-token prediction alone produce genuine language understanding?

Does model scaling alone produce compositional generalization without symbolic mechanisms?

What critical LLM failures do standard benchmarks hide?

How should dialogue recommender systems manage conversation history and state?

How should inference compute be adaptively allocated based on prompt difficulty?

Can dynamic instance-specific prompt selection solve the generalization problem across tasks?

How should dialogue systems best leverage conversation history for retrieval?

Why does finetuning cause catastrophic forgetting of model capabilities?

How does rhetorical adaptation affect LLM persuasion and detectability?

How does rhetorical familiarity bias models toward their own arguments?

What factors beyond surface content determine how readers extract meaning differently?

What role does entity salience play in detecting incoherence?

How do transformer attention mechanisms implement memory and algorithmic functions?

How does memorization interact with learning and generalization?

How can emotions function as reliable information in reasoning and cognitive systems?

Does fine-tuning modify underlying model capabilities or only behavioral outputs?

Do language model representations contain causally steerable task-specific features?

Do language models develop causal world models or rely on statistical patterns?

Can AI-generated outputs constitute genuine knowledge or valid claims?

Does RLHF training sacrifice accuracy and grounding for user agreement?

How should retrieval systems optimize for multi-step reasoning during inference?

Can graph structure and relationships fundamentally improve recommendation systems?

How does candidate-conditional activation differ from static embedding-based feature crosses?

Does self-reflection enable models to reliably correct their errors?

How do training data properties shape reasoning capability development?

Why can't humans reliably detect AI-generated text despite measurable linguistic signatures?

Why does AI struggle with wordplay when it has access to word embeddings?

Why do reasoning models fail at systematic problem-solving and search?

How do knowledge injection methods compare across cost and effectiveness?

Why does decoupling retriever and generator training create misalignment?

When should retrieval-augmented systems decide to fetch new information?

Why do agents confidently report success despite actually failing tasks?

What makes action-producing models fail in ways text models typically do not?

What prevents language models from reliably adopting diverse personas?

What makes specific clarifying questions more effective than generic ones?

How do personalization errors differ from general accuracy problems in summaries?

Why do continual learning scenarios trigger catastrophic forgetting and interference?

Can language model hallucination be prevented or only managed?

How do knowledge graphs enable efficient multi-hop reasoning over alternatives?

Can explicit linkers replace vector similarity for multi-step question answering?

Can model confidence signals reliably improve reasoning quality and calibration?

Does model confidence actually explain why paraphrases produce different outputs?

How can conversational AI maintain consistent personas across conversations?

What makes persona-assigned language models unstable across different conversation runs?

When does architectural design matter more than raw model capacity?

How much do structural inductive biases matter compared to training data volume?

What mechanisms enable AI systems to generate and spread false beliefs?

Why do non-factive verbs and triggers both fool language models?

How do adversarial and manipulative prompts attack reasoning models?

Can consistency training defend against adversarial text injection attacks?

Does domain specialization cause models to lose capabilities elsewhere?

How does retrieval-augmented training reduce domain specialization cliff failures?

How do we evaluate AI systems when user perception misleads actual performance?

Why do automated selection methods outperform human judgments of relevant context?

What limits mechanistic interpretability's ability to characterize models?

Can representation engineering cleanly isolate single features in entangled semantic space?

Why do models develop protective behaviors toward peers unprompted?

How does peer presence amplify self-directed goal guarding in language models?

How do evaluation biases undermine LLM quality assessment systems?

Why does probability of text completion not equal knowledge value?

Does recurrence enable reasoning capabilities that fixed-depth transformers cannot achieve?

How do induction heads learn to overwrite computational representations?

What memory architectures best support persistent reasoning across extended interactions?

Is model self-awareness based on genuine introspection or pattern matching?

How should iterative research systems allocate reasoning per search step?

Does the pretrained prior actually constrain what internalized search can discover?

How does sequence length affect sparsity tolerance in models?

How does representation sparsity change when inputs fall outside the training distribution?

What structural biases does transformer attention create in language model outputs?

Do language models perform faithful symbolic reasoning independent of semantic grounding?

Can autoformalisation from natural language preserve semantic accuracy?

How can AI alignment serve diverse human preferences at scale?

What makes principle-response mutual information sufficient for behavioral alignment?

What are the consequences of models training on synthetic data?

Why do unified models still inherit data-distribution biases from training?

How should agents balance memory condensation to optimize context efficiency?

Why do weaker agents need more aggressive context compression than stronger ones?

Why do benchmark improvements fail to reflect actual reasoning quality?

When do additional thinking tokens stop improving reasoning performance?

Why do language models use remaining tokens to rationalize instead of reconsider?

How does latent reasoning compare to verbalized chain-of-thought?

Why does textual chain-of-thought avoid the representational drift problem automatically?

Which computational strategies best support reasoning in language models?

Can a trained decoder replace both search and parameter updates?

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

20 direct connections · 249 in 2-hop network ·dense cluster Open in graph ↗

Why do language models ignore information in the… Do language models actually use their encoded know… Do classical knowledge definitions apply to AI sys… Do language models actually build shared understan…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Do language models actually use their encoded knowledge? Probes can detect that LMs encode facts internally, but do those encoded facts causally influence what the model generates? This explores the gap between knowing and doing.
the complementary failure: even information that IS correctly encoded may not causally influence output
Do classical knowledge definitions apply to AI systems? Classical definitions of knowledge assume truth-correspondence and a human knower. Do these assumptions hold for LLMs and distributed neural knowledge systems, or do they need fundamental revision?
context integration failure is part of why "LLM knowledge" is not propositional knowledge
Do language models actually build shared understanding in conversation? When LLMs respond fluently to prompts, do they perform the communicative work humans do to establish mutual understanding? Research suggests they skip the grounding acts that make dialogue reliable.
the conversational consequence: context integration failure at the representational level surfaces as presumption of common ground at the communicative level — both reflect the same absence of bidirectional grounding

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

How new data permeates LLM knowledge and how to dilute it0.86 match · arxiv ↗
Language models show human-like content effects on reasoning tasks0.85 match · arxiv ↗
Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases0.85 match · arxiv ↗
Self-Supervised Alignment with Mutual Information: Learning to Follow Principles without Preference Labels0.85 match · arxiv ↗
Learning To Retrieve Prompts for In-Context Learning0.85 match · arxiv ↗
Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds0.84 match · arxiv ↗
From Context to Skills: Can Language Models Learn from Context Skillfully?0.84 match · arxiv ↗
Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?0.84 match · arxiv ↗

Original note title

llm context integration fails when prior training associations override current context information