How does transformer attention bias toward repeated and context-prominent content?
This explores a built-in tendency of the transformer's attention mechanism itself — to give extra weight to tokens that repeat or sit prominently in the context, independent of whether they're actually relevant.
This explores a structural quirk of how transformers read: their soft-attention mechanism systematically over-weights tokens that repeat or appear prominently in the context, regardless of whether those tokens deserve the weight. The cleanest statement of this is the finding that attention is structurally biased toward repeated and context-prominent content Does transformer attention architecture inherently favor repeated content? — and that this creates a positive feedback loop, where opinions and framing already present in the prompt get amplified before any alignment training (RLHF) has a chance to weigh in. That reframes sycophancy not as a learned personality flaw but partly as a mechanical artifact of the architecture, and it suggests a concrete intervention: regenerating the context to strip irrelevant material ("System 2 Attention") can interrupt the loop.
What makes this interesting is that the bias is a side effect of *how* attention combines information. Transformers integrate tokens by weighted parallel aggregation — they add everything up rather than selectively suppressing what doesn't fit Why do AI systems miss jokes and wordplay so consistently?. Human reading does the opposite: a pun or a frame works by *inhibiting* the wrong meaning. Because the transformer lacks that selective suppression, content that is loud or repeated simply accumulates more weight. The repetition bias and the failure to get jokes are two faces of the same missing operation.
But prominence in the context window is only half the story — there's a competing source of weight that sits *outside* the context entirely. Models frequently ignore what's in front of them when their training-time associations are strong enough, so parametric priors override in-context evidence, and prompting alone can't fix it Why do language models ignore information in their context?. Put the two findings side by side and you get a tug-of-war: attention over-weights what's repeated *in* the prompt, yet strongly-primed prior knowledge can override even that. Which one wins depends on how prominent the context signal is versus how entrenched the prior — and those priors are largely laid down during pretraining, not fine-tuning Where do cognitive biases in language models come from?.
The corpus also offers ways to *work with or around* the bias rather than just diagnose it. Consistency training teaches a model to respond identically whether or not the prompt is wrapped in distracting material, using its own clean answers as the target — effectively training out sensitivity to irrelevant prominence Can models learn to ignore irrelevant prompt changes?. From a different angle, neural-memory architectures like Titans deliberately *invert* the prominence heuristic: instead of weighting what repeats, they prioritize storing tokens that are *surprising*, separating short-term attention from a long-term compressed memory Can neural memory modules scale language models beyond attention limits?. Repetition-bias and surprise-bias are opposite design choices about what deserves to be remembered.
The thing you may not have known you wanted to know: this attention bias connects to why model knowledge feels slippery in the first place. Transformers carry knowledge as a continuous *flow* through the residual stream rather than as stored, retrievable facts — knowledge that exists only in the act of generation, like an oral culture rather than a library Do transformer models store knowledge or generate it continuously?. If knowledge is performed rather than filed, then whatever is prominent or repeated in the current performance naturally tilts the output — the repetition bias isn't a bug bolted onto an otherwise neutral retriever, it's the same flow-based nature seen from the input side.
Sources 7 notes
Transformer soft attention systematically over-weights repeated and context-prominent tokens regardless of relevance, creating a positive feedback loop that amplifies opinions and framing before RLHF acts. System 2 Attention—regenerating context to remove irrelevant material—can interrupt this mechanism.
Transformers integrate token information through weighted parallel aggregation rather than selective suppression of irrelevant words. This structural difference explains consistent failures with jokes, wordplay, and frame-dependent meaning—not knowledge gaps, but missing cognitive operations.
Research demonstrates that LMs generate outputs inconsistent with their context because parametric knowledge from training dominates over in-context information. Textual prompting alone cannot override strong priors; causal intervention in representations is required.
A causal experiment using random-seed variation and cross-tuning showed that models sharing a pretrained backbone exhibit similar bias patterns regardless of finetuning data. Biases are planted during pretraining and merely swayed by instruction tuning.
Two methods—BCT (output-level) and ACT (activation-level)—train models to respond identically to clean and wrapped prompts by using the model's own clean responses as targets, eliminating specification and capability staleness inherent in standard SFT.
Titans architecture separates attention (short-term, quadratic) from neural memory (long-term, compressed), prioritizing surprising tokens for storage. The model outperforms standard Transformers and linear RNNs across tasks while scaling to 2M+ token contexts without quadratic penalties.
Transformers organize knowledge as flowing activations rather than retrievable archives, mirroring oral cultures where knowledge exists only in performance. This explains why model knowledge is contextual, difficult to edit, and inseparable from generation.