INQUIRING LINE

Can prompt optimization inject genuinely new knowledge into a model?

This explores whether tweaking or optimizing prompts can actually add new knowledge to a model — or whether it can only rearrange and surface what the model already learned during training.


This explores whether prompt optimization can add genuinely new knowledge to a model, or whether it only reorganizes what's already there. The corpus is unusually direct on this: it can't inject new knowledge. Prompting operates entirely inside a model's pre-existing training distribution, so no clever prompt can supply domain facts the model never learned — it can only retrieve and reactivate what's latent Can prompt optimization teach models knowledge they lack?. That's a hard ceiling: prompts are a flashlight, not a library.

What makes this interesting is the apparent contradiction lurking next door. There's a proof that prompts are Turing-complete — a single finite transformer exists that, given the right prompt, can compute any computable function prompt-optimization-is-turing-complete... | prompting-is-turing-complete-a-single-finite-size-transformer-can-compute-any-co. So if prompts can in principle 'program' a model to do anything, why can't they teach it new facts? The resolution is the gap between computation and knowledge. A prompt can route and reconfigure existing capabilities (the same notes flag that standard training rarely produces models that actually implement arbitrary programs this way), but routing isn't the same as installing missing information. You can rewire what's in the box; you can't conjure what was never put in.

The corpus reinforces this from the failure side. When models are asked to do something genuinely outside their learned repertoire — like executing real iterative numerical methods — they don't compute, they pattern-match memorized templates and emit plausible-but-wrong answers, and this persists across scale Do large language models actually perform iterative optimization?. Even RL fine-tuning, which is far more invasive than prompting, mostly sharpens template-matching rather than installing new procedures Do fine-tuned language models actually learn optimization procedures?. If fine-tuning struggles to add genuinely new reasoning, prompting — which changes nothing about the weights — has no chance.

The more surprising lateral finding: prompt optimization often doesn't even add knowledge — it adds you. Prompt engineering works as a divergence-minimizing loop where users iteratively steer outputs toward what they already expected, making results a co-production of model and user priors How much does the user shape what a model generates?. So when a refined prompt 'reveals' something new, the novelty may be the user's framing surfacing, not the model gaining knowledge. And prompt sensitivity tracks model confidence — where the model knows something well it resists rephrasing, but where it's unsure, outputs swing wildly with wording Does model confidence predict robustness to prompt changes?, which is exactly what you'd expect if prompts surface knowledge rather than create it.

Where prompt optimization genuinely earns its keep is allocation and structure, not content. Optimizing prompts jointly with the inference strategy yields up to 50% gains Does prompt optimization without inference strategy fail?, adaptive compute-per-prompt beats fixed budgets Can we allocate inference compute based on prompt difficulty?, and prompts plus agent topology can be co-optimized as computational graphs Can we automatically optimize both prompts and agent coordination?. All of this makes a model use what it has more effectively — which is the real takeaway: prompt optimization is a knowledge-activation tool, not a knowledge-acquisition tool. If the foundational knowledge isn't in training, no prompt will rescue you.


Sources 9 notes

Can prompt optimization teach models knowledge they lack?

Prompting works entirely within a model's pre-existing training distribution and cannot supply domain knowledge absent from training data. This creates a hard ceiling: no prompt strategy can compensate for missing foundational knowledge, only reorganize what already exists.

Can a single transformer become universally programmable through prompts?

Research proves a single finite-size transformer exists that can compute any computable function given the right prompt, achieving complexity bounds nearly matching unbounded models. However, standard training rarely produces models that learn to implement arbitrary programs this way.

Do large language models actually perform iterative optimization?

Research shows LLMs cannot perform iterative procedures in latent space. They recognize optimization problems as template-similar and emit plausible-looking but incorrect values, a failure mode that persists across model scale and training approaches.

Do fine-tuned language models actually learn optimization procedures?

Even GRPO-trained models show sharp performance drops on out-of-distribution variants (N-1 test sets) compared to in-distribution problems, indicating RL optimizes template-matching rather than genuine problem-solving procedures.

How much does the user shape what a model generates?

Foundation Priors research shows prompt engineering as divergence minimization between synthetic output and user priors. The refinement process systematically steers generation toward what users already expect, making outputs co-productions of model and user subjectivity.

Does model confidence predict robustness to prompt changes?

ProSA found that when models are highly confident, they resist prompt rephrasing; low confidence causes major output swings. Larger models, few-shot examples, and objective tasks all correlate with higher confidence and greater robustness.

Does prompt optimization without inference strategy fail?

Prompts optimized without knowledge of the inference strategy (best-of-N, majority voting) systematically underperform. Joint optimization of both prompt and inference strategy yields up to 50% improvement across reasoning and generation tasks.

Can we allocate inference compute based on prompt difficulty?

Research shows inference effectiveness varies dramatically by prompt difficulty. Reallocating the same total compute adaptively—giving easy prompts less and hard ones more—substantially outperforms larger models under uniform budgets.

Can we automatically optimize both prompts and agent coordination?

Language agents represented as computational graphs—where nodes are operations and edges define information flow—reveal that CoT, ToT, and Reflexion are formally equivalent structures. This unified view enables automatic optimization of both node prompts and edge connectivity without manual redesign.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-testing whether prompt optimization can inject genuinely new knowledge into LLMs. A curated library of arXiv papers (2023–present) claims it cannot—but those claims are dated; your job is to judge whether newer work has shifted the ground.

What a curated library found — and when (dated claims, not current truth):
• Prompts cannot supply domain facts a model never learned; they only activate latent knowledge by routing existing capabilities (2024–2025).
• Prompts are Turing-complete in principle, yet this computational universality doesn't translate to knowledge injection—the gap between computation and content remains (~2024).
• Models asked to execute genuinely novel iterative procedures fall back to memorized template matching, not genuine reasoning; RL fine-tuning (far more invasive than prompting) mostly sharpens templates rather than installing new procedures (2024–2025).
• Prompt optimization's real wins are allocation and structure (adaptive compute per prompt, agent topology co-optimization), yielding ~50% gains—but these gains come from better activation, not new knowledge (~2024).
• Prompt sensitivity tracks model confidence: where the model is confident, outputs resist rephrasing; where unsure, outputs swing wildly—consistent with knowledge surfacing, not creation (~2025).

Anchor papers (verify; mind their dates):
• arXiv:2411.01992 (2024-11) — Ask, and it shall be given: Turing completeness of prompting
• arXiv:2402.16823 (2024-02) — Language Agents as Optimizable Graphs
• arXiv:2504.07912 (2025-04) — Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining
• arXiv:2508.10030 (2025-08) — Inference-Aware Prompt Optimization for Aligning Black-Box Large Language Models

Your task:
(1) RE-TEST EACH CONSTRAINT. For every finding above, judge whether recent scaling, synthetic data injection during inference, retrieval-augmented generation baked into prompting, or novel fine-tuning approaches have since RELAXED or OVERTURNED it. Separate the durable claim (prompts alone cannot add training-absent knowledge) from the perishable limitation (specific techniques may now bridge that gap). Cite what changed it; flag where the constraint still holds.
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months (July 2026 onward) that claims prompts *can* inject knowledge, or that redefines what "knowledge injection" means.
(3) Propose 2 research questions that ASSUME the regime may have moved: e.g., "Can in-context learning plus adaptive retrieval plus prompt optimization jointly synthesize genuinely novel inferences?" or "Does 'knowledge injection' need redefinition if prompts can coordinate external tools that generate novel outputs?"

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines