SYNTHESIS NOTE

Can optimal experimental design improve few-shot example selection?

Rather than picking examples by similarity, could actively selecting the most informative unlabeled examples—those that reduce the model's prediction uncertainty—lead to better in-context learning performance across different model sizes?

Synthesis note · 2026-06-03 · sourced from Prompts Prompting

In-context learning lets you put query-specific examples in the prompt, but which examples? AIPD frames this as active learning with optimal experimental design: starting from an unlabeled training pool, adaptively choose the examples whose labels would maximally reduce the LLM's prediction uncertainty across the test set, then pay to label only those. Two algorithms instantiate it — GO (minimize the variance of the posterior covariance for any test example) and SAL (simulate the impact of labeling each unlabeled example) — analyzed in linear models and shown to outperform other few-shot selection methods across small, medium, and large LMs.

The keeper is the reframing of demonstration selection as a budgeted experimental-design problem: under a labeling budget, the right examples are the ones that most reduce uncertainty over the actual test distribution, not the ones most superficially similar to the query. This is principled, test-set-aware example selection rather than heuristic retrieval.

This adds an active-learning lever to the vault's prompting/ICL thread. It complements Does learning from mistakes improve in-context learning? (LEAP — extract more from given examples) by addressing which examples to acquire in the first place, and the uncertainty-reduction objective echoes information-gain question-selection work elsewhere in the vault.

Inquiring lines that read this note 7

This note is a source for these research framings, grouped by the broader line of inquiry each explores. Scan the bold lines of inquiry; follow any specific question forward.

Can prompting inject entirely new knowledge into language models?

Do few-shot examples improve in-context learning or add noise?

How do training data properties shape reasoning capability development?

What makes a good in-context learning example for a given task?

Why do continual learning scenarios trigger catastrophic forgetting and interference?

Why does reinforcement learning suppress output diversity compared to supervised fine-tuning?

How does active selection of training content differ from random reinforcement sampling?

How does example difficulty affect learning efficiency in language models?

Can learned priors effectively select and weight ensemble members by inference budget?

How do training priors constrain what context information can override?

Why is in-context learning brittle to the order of examples presented?

Related concepts in this collection 2

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

12 direct connections · 158 in 2-hop network ·dense cluster Open in graph ↗

Can optimal experimental design improve few-shot… Does learning from mistakes improve in-context lea… Why do chain-of-thought examples fail across diffe…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Does learning from mistakes improve in-context learning? Explores whether inducing models to make errors on few-shot examples, then having them articulate principles from those mistakes, leads to better performance than learning from correct examples alone.
LEAP extracts more from given examples; AIPD chooses which examples to acquire
Why do chain-of-thought examples fail across different conditions? Chain-of-thought exemplars show surprising sensitivity to order, complexity level, diversity, and annotator style. Understanding these brittleness dimensions could reveal what makes reasoning prompts robust or fragile.
principled test-aware selection is a response to exemplar brittleness

Can optimal experimental design improve few-shot example selection?

Inquiring lines that read this note 7

Related concepts in this collection 2

Related papers in this collection 8

Search by related questions 4