Can optimal experimental design improve few-shot example selection?
Rather than picking examples by similarity, could actively selecting the most informative unlabeled examples—those that reduce the model's prediction uncertainty—lead to better in-context learning performance across different model sizes?
In-context learning lets you put query-specific examples in the prompt, but which examples? AIPD frames this as active learning with optimal experimental design: starting from an unlabeled training pool, adaptively choose the examples whose labels would maximally reduce the LLM's prediction uncertainty across the test set, then pay to label only those. Two algorithms instantiate it — GO (minimize the variance of the posterior covariance for any test example) and SAL (simulate the impact of labeling each unlabeled example) — analyzed in linear models and shown to outperform other few-shot selection methods across small, medium, and large LMs.
The keeper is the reframing of demonstration selection as a budgeted experimental-design problem: under a labeling budget, the right examples are the ones that most reduce uncertainty over the actual test distribution, not the ones most superficially similar to the query. This is principled, test-set-aware example selection rather than heuristic retrieval.
This adds an active-learning lever to the vault's prompting/ICL thread. It complements Does learning from mistakes improve in-context learning? (LEAP — extract more from given examples) by addressing which examples to acquire in the first place, and the uncertainty-reduction objective echoes information-gain question-selection work elsewhere in the vault.
Inquiring lines that use this note as a source 5
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Do few-shot examples improve in-context learning or add noise?
- What makes a good in-context learning example for a given task?
- How does the Learning Law explain why all examples should contribute equally?
- Can data pruning and equal contribution be reconciled in optimal learning?
- How does active selection of training content differ from random reinforcement sampling?
Related concepts in this collection 2
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Does learning from mistakes improve in-context learning?
Explores whether inducing models to make errors on few-shot examples, then having them articulate principles from those mistakes, leads to better performance than learning from correct examples alone.
LEAP extracts more from given examples; AIPD chooses which examples to acquire
-
Why do chain-of-thought examples fail across different conditions?
Chain-of-thought exemplars show surprising sensitivity to order, complexity level, diversity, and annotator style. Understanding these brittleness dimensions could reveal what makes reasoning prompts robust or fragile.
principled test-aware selection is a response to exemplar brittleness
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Experimental Design for Active Transductive Inference in Large Language Models
- In-Context Principle Learning from Mistakes
- Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs
- Self-Adapting Language Models
- Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm
- Latent Skill Discovery for Chain-of-Thought Reasoning
- Reflect then Learn: Active Prompting for Information Extraction Guided by Introspective Confusion
- Inference-Aware Prompt Optimization for Aligning Black-Box Large Language Models
Original note title
active in-context prompt design uses optimal experimental design to choose the most informative few-shot examples to label