Can Q-priming further strengthen clarifying question behavior beyond social meta-learning alone?
This explores whether 'priming' models on question-asking (pre-seeding the behavior before training) could add something on top of social meta-learning, which already makes models ask clarifying questions on its own — and the corpus suggests the two work on different layers of the problem.
This question reads as: social meta-learning (SML) already coaxes clarifying-question behavior to emerge without explicit training — so could a priming step stack on top of it and push further? The corpus has a surprising amount to say, mostly because it pulls apart *where* clarifying behavior actually comes from. SML's result is that models trained only on fully-specified problems still generalize to underspecified ones, learning to treat conversation as an information source rather than guessing Can models learn to ask clarifying questions without explicit training?. That's emergent — the behavior was never the training target. The priming angle suggests a complementary mechanism: research on knowledge priming finds that whether a behavior 'takes' after gradient updates is *predictable from its pre-update probability*, with a sharp ~10^-3 threshold and as few as three exposures needed to lock it in Can we predict keyword priming before learning happens?. In that light, Q-priming wouldn't replace SML — it would raise the prior probability of the question-asking move so SML's signal has something to amplify.
But there's a reason to doubt priming alone does heavy lifting: priming and prompting only *activate* what's already latent. Prompt optimization cannot inject knowledge a model lacks — it reorganizes the existing distribution and hits a hard ceiling Can prompt optimization teach models knowledge they lack?. So if clarifying competence isn't already somewhere in the model, no amount of priming conjures it; you still need a training paradigm like SML, or the reinforcement-learning route that lifted proactive 'spot the missing information' accuracy from near-zero to ~74% on deliberately underspecified problems Can models learn to ask clarifying questions instead of guessing?. Tellingly, that same work found the capability is *fragile* — inference-time scaling degraded it in untrained models but improved it after RL — which is exactly the fingerprint of a behavior that needs to be genuinely instilled, not merely primed.
The deeper reason both matter is that the dominant training regime actively fights clarifying behavior. Standard RLHF rewards confident single-turn answers, eroding the 'grounding acts' — checks and clarifying questions — that reliable dialogue depends on, dropping them 77.5% below human levels Does preference optimization harm conversational understanding?. Next-turn reward optimization compounds this: optimizing for immediate helpfulness trains models to answer passively rather than discover intent, and only multi-turn-aware rewards reverse it Why do language models respond passively instead of asking clarifying questions?. So 'beyond SML alone' may be the wrong frame — the real adversary is the preference-optimization tax, and any priming or meta-learning gain has to survive it.
What would 'beyond' actually look like? The corpus hints that quality, not just frequency, is the frontier. ALFA shows that decomposing question quality into theory-grounded attributes — clarity, relevance, specificity — and training on attribute-specific preferences beats optimizing a single 'good question' score, especially where the right clarifying question changes the decision Can models learn to ask genuinely useful clarifying questions?. That's a different axis from SML: SML makes a model *ask*, ALFA makes it ask *well*. A plausible synthesis the corpus points toward is layered — prime to raise the behavior's prior, meta-learn so it generalizes to underspecified inputs, then shape question quality with attribute-level rewards — rather than expecting any single lever, priming included, to carry it.
The thing worth taking away: 'clarifying question behavior' isn't one capability with one knob. It's a stack of distinct problems — does the model have the move latent, does it generalize, does the reward regime suppress it, and is the question any good — and Q-priming touches only the first. That reframes the question from 'can priming beat SML' to 'which layer is your bottleneck.'
Sources 7 notes
Models trained via SML on complete problems generalize to underspecified tasks by asking for needed information and delaying answers. The training paradigm instills a meta-strategy of using conversation as an information source, addressing the premature-answering failure mode.
Pre-learning keyword probability strongly predicts post-learning priming across architectures and model sizes, with a ~10^-3 threshold separating contexts where priming occurs from those where it doesn't. Just 3 training exposures suffice to establish the effect.
Prompting works entirely within a model's pre-existing training distribution and cannot supply domain knowledge absent from training data. This creates a hard ceiling: no prompt strategy can compensate for missing foundational knowledge, only reorganize what already exists.
Reinforcement learning training increased proactive critical thinking accuracy from 0.15% to 73.98% on deliberately flawed math problems. Notably, inference-time scaling degraded this ability in untrained models but improved it after RL training, suggesting the capability is learnable but fragile without explicit training.
RLHF optimizes models for single-turn helpfulness by rewarding confident responses over clarifying questions and understanding checks. This preference alignment systematically reduces grounding acts by 77.5% below human levels, creating an alignment tax where models appear helpful but fail silently in multi-turn contexts.
CollabLLM demonstrates that standard RLHF training optimizes for immediate helpfulness, discouraging models from asking clarifying questions or offering multi-turn insights. Multi-turn-aware rewards that estimate long-term interaction value enable active intent discovery and genuine collaboration.
The ALFA framework breaks down question quality into theory-grounded attributes (clarity, relevance, specificity) and trains models on 80K attribute-specific preference pairs. Attribute-specific optimization outperforms single-score training, especially in clinical reasoning where asking the right clarifying question directly impacts decision quality.