How much of prompt sensitivity is really just frequency optimization in disguise?
This explores a deflationary reading of prompt sensitivity — the idea that models swing wildly on rephrased prompts not because wording carries deep meaning, but because some phrasings simply land closer to high-probability regions of the training distribution, making 'prompt engineering' a search for frequent patterns rather than better communication.
This explores whether prompt sensitivity — a model giving different answers to reworded versions of the same request — is mostly just a hunt for phrasings that match what the model saw often during training. The corpus gives this suspicion real support, but also complicates it. The strongest backing comes from work showing that prompt optimization can only activate knowledge already present and cannot inject anything new Can prompt optimization teach models knowledge they lack?. If prompting only reorganizes what's already in the training distribution, then a 'good' prompt is partly one that lands in a well-trodden, high-frequency region — which is exactly what the frequency-optimization framing predicts.
But the corpus reframes the driver as confidence rather than raw frequency. ProSA found that prompt sensitivity is a reflection of how confident the model is: highly confident models shrug off rephrasing, while low-confidence ones swing hard Does model confidence predict robustness to prompt changes?. Frequency and confidence are cousins — a model is confident where its training was dense — so this is consistent with the disguise hypothesis, but it relocates the cause from 'the prompt's wording' to 'the model's internal certainty about that region.' Notably, larger models, few-shot examples, and objective tasks all raise confidence and reduce sensitivity, which means the same wording stops mattering as the model's grip on the territory firms up.
The 'in disguise' part starts to break down once you look at what actually varies. Prompt effectiveness depends sharply on model tier — rephrasing and background-knowledge prompts lift cheap models, while step-by-step reasoning actively hurts strong ones — and the deciding factor is task structure, not generic best practices Do prompt techniques work the same across all LLM tiers?. That's hard to explain as pure frequency-matching: if it were just frequency, the same 'frequent' phrasings would help everywhere. Similarly, prompt quality turns out to have measurable, structured dimensions grounded in communication theory — clarity, logic, instruction, hallucination control — that improve outcomes independent of the model's outputs Can we measure prompt quality independent of model outputs?. There's genuine signal in *how you communicate*, not only in *which tokens are common*.
The most telling counter-evidence is that sensitivity can be trained or engineered away entirely. Consistency training teaches a model to respond identically to clean and perturbed prompts, using its own clean answers as targets Can models learn to ignore irrelevant prompt changes? — meaning the sensitivity was a removable artifact, not a law of how prompts work. And architecture-level methods like personality adapters bypass prompt phrasing altogether, writing the desired behavior directly into every layer Can we control personality in language models without prompting?. If you can dial behavior in without touching the prompt, then the prompt was never the fundamental control surface.
So the honest answer: a large share of prompt sensitivity *is* frequency optimization in disguise — you're often paying to find phrasings that match dense, confident regions of the model's distribution, and you cannot prompt past the ceiling of what was trained in Can prompt optimization teach models knowledge they lack?. But not all of it. The residue that survives — task-structure dependence, measurable communication quality, and the fact that sensitivity can be trained out or sidestepped at the weights — is real signal about reasoning and communication, not just frequency. The deeper surprise hiding here: there's also a theoretical ceiling on the *other* side, since the right prompt can in principle make a single transformer compute anything Can a single transformer become universally programmable through prompts?, yet ordinary training almost never produces models that exploit that power — so prompt sensitivity sits in the gap between what prompts could theoretically do and what frequency-shaped training actually taught the model to do.
Sources 7 notes
Prompting works entirely within a model's pre-existing training distribution and cannot supply domain knowledge absent from training data. This creates a hard ceiling: no prompt strategy can compensate for missing foundational knowledge, only reorganize what already exists.
ProSA found that when models are highly confident, they resist prompt rephrasing; low confidence causes major output swings. Larger models, few-shot examples, and objective tasks all correlate with higher confidence and greater robustness.
A 23-prompt benchmark across 12 LLMs shows rephrasing and background-knowledge prompts boost cheap models, while step-by-step reasoning reduces accuracy in high-performance models. Task structure, not generic best practices, determines which prompts help.
Research identifies six evaluable dimensions—Communication, Cognition, Instruction, Logic, Hallucination, and Responsibility—with 20 sub-criteria based on Grice, cognitive load theory, and instructional design. Improvements in one dimension cascade to others, revealing prompt quality as a structured space rather than a flat checklist.
Two methods—BCT (output-level) and ACT (activation-level)—train models to respond identically to clean and wrapped prompts by using the model's own clean responses as targets, eliminating specification and capability staleness inherent in standard SFT.
PsychAdapter modifies every transformer layer with <0.1% additional parameters to achieve 87.3% Big Five accuracy and 96.7% depression/life satisfaction accuracy across GPT-2, Gemma, and Llama 3. This architecture-level approach bypasses prompt resistance entirely.
Research proves a single finite-size transformer exists that can compute any computable function given the right prompt, achieving complexity bounds nearly matching unbounded models. However, standard training rarely produces models that learn to implement arbitrary programs this way.