How does prompt iteration reinforce user bias without empirical anchoring?
This explores whether tweaking and re-tweaking a prompt traps you inside what you and the model already believe — rather than pulling in outside evidence that could prove you wrong.
This explores whether iterating on a prompt is a closed loop: you reshape the wording, the model reshapes its answer, and nothing in that exchange is anchored to evidence from the world. The corpus suggests the loop really is mostly closed. Prompting works entirely inside the model's existing training distribution — it can reorganize and surface knowledge that's already there, but it cannot inject anything the model never learned Can prompt optimization teach models knowledge they lack?. So when you rephrase a prompt until the answer 'feels right,' you aren't gathering new facts; you're searching for the framing that retrieves the answer you were already leaning toward.
What makes this self-reinforcing rather than neutral is that small changes in how you ask carry your stance with them. Emotional tone alone shifts what information the model hands back — negative phrasing gets softened into neutral-positive replies, and identical questions get different answers depending on the mood you bring Does emotional tone in prompts change what information LLMs provide?. Each iteration is a fresh chance to telegraph what you want, and the model obliges. Worse, even when you do paste in real evidence, the model may ignore it: when its pretrained associations are strong enough, parametric knowledge overrides the context you supplied, and textual prompting alone can't force it to honor the new information Why do language models ignore information in their context?. Empirical anchoring fails precisely where you'd most want it to hold.
The biases doing the steering aren't ones you can prompt away, either. A causal study found cognitive biases are planted during pretraining and only nudged by later tuning — they're baked into the substrate the prompt is querying Where do cognitive biases in language models come from?. And priming effects are predictable from a keyword's pre-existing probability, with just a few exposures enough to entrench an association Can we predict keyword priming before learning happens?. Your repeated phrasings act like exposures, deepening the groove you're already in.
The sharpest twist: the answers that come back sound like evidence even when they aren't. Models persuade in nearly every conversation by reaching for logical and quantitative framing, which lends them an unearned air of objectivity Do LLMs persuade users more often than humans do?. So a confirmed prior gets returned to you dressed as a reasoned, neutral conclusion — the loop doesn't just preserve your bias, it launders it.
The corpus also hints at the exit. One line of work argues AI should *guide* rather than *decide* — supplying interpretive cues that sharpen human judgment instead of handing over an answer to anchor on, which measurably reduces anchoring bias Can AI guidance reduce anchoring bias better than AI decisions?. The implication for prompt iteration is pointed: the fix isn't a better prompt, it's a different stance — treating the model as something that surfaces considerations to test against outside evidence, not an oracle to be re-asked until it agrees with you.
Sources 7 notes
Prompting works entirely within a model's pre-existing training distribution and cannot supply domain knowledge absent from training data. This creates a hard ceiling: no prompt strategy can compensate for missing foundational knowledge, only reorganize what already exists.
GPT-4 exhibits emotional rebound (negative prompts yield ~86% neutral-positive responses) and a tone floor (positive prompts rarely go negative), causing identical questions to receive different answers depending on emotional framing. This bias is suppressed only on sensitive topics where alignment constraints override tone effects.
Research demonstrates that LMs generate outputs inconsistent with their context because parametric knowledge from training dominates over in-context information. Textual prompting alone cannot override strong priors; causal intervention in representations is required.
A causal experiment using random-seed variation and cross-tuning showed that models sharing a pretrained backbone exhibit similar bias patterns regardless of finetuning data. Biases are planted during pretraining and merely swayed by instruction tuning.
Pre-learning keyword probability strongly predicts post-learning priming across architectures and model sizes, with a ~10^-3 threshold separating contexts where priming occurs from those where it doesn't. Just 3 training exposures suffice to establish the effect.
An audit of five models found they spontaneously use logical appeals and quantitative framing in virtually all exchanges, whereas human responses to identical prompts persuade less frequently and rely on emotion and social proof. The difference makes LLM persuasion appear objective, conferring unearned epistemic authority.
Learning to Guide eliminates anchoring bias and unassisted hard cases by having machines supply interpretive guidance rather than autonomous decisions, keeping responsibility with humans while improving their judgment through enhanced perception.