INQUIRING LINE

Inquiring lines›How should we train models for cap…›What systematic failures and vulne…›How do training priors constrain w…›this inquiring line

The knowledge gap between teacher and student AI is the whole point — training data that erases it teaches nothing.

How should training data be constructed to preserve teacher-student information gaps?

This explores how to build training data so the teacher's privileged knowledge becomes a usable learning signal — rather than collapsing the very gap that makes teaching work — and what the corpus says about constructing that asymmetry deliberately.

This explores how to build training data so the teacher's privileged knowledge becomes a usable learning signal, rather than erasing the gap that makes teaching possible in the first place. The corpus has a surprisingly coherent position on this, and it starts from a counterintuitive claim: the information gap is the point. One note argues that meaningful corrective feedback only exists *because* the teacher knows something the student doesn't — access to the correct answer or a verifier's output — and without that asymmetry, teacher and student share identical uncertainty and there's nothing to correct Why does teacher-student information asymmetry enable learning signals?. So 'preserving the gap' isn't a side concern; it's the mechanism you're trying to harness.

But here's the twist the corpus keeps returning to: you can overdo it. When teachers are conditioned on the correct answer and verifier output, they produce confident, short, certain traces — and students inherit that confidence wholesale. That feels like a win in-domain, but it quietly suppresses the student's ability to express uncertainty, which wrecks generalization on out-of-distribution problems that actually require epistemic caution Does richer teacher context hurt student generalization?. In other words, the teacher's privileged knowledge leaks into the *style* of the data, and the student copies the style without owning the underlying knowledge. The gap was supposed to generate a learning signal; instead it generated overconfidence.

The resolution running through several notes is that the student, not the teacher, should be the filter. Teacher-refined data — even when objectively higher quality — degrades performance when it exceeds what the student can actually absorb; students do better keeping only the refinements compatible with their own statistical profile Does teacher-refined data always improve student model performance?. So data construction isn't 'dump the teacher's best output and distill' — it's 'expose the gap, then let the student selectively close the parts it can reach.' Walmart's case makes the upside concrete: BERT cross-encoders actually *beat* their LLM teachers once trained on large enough augmented sets of teacher-labeled queries, because the teacher's labels smoothed a much broader input distribution than the teacher itself ever generalized over Can smaller models outperform their LLM teachers with enough data?. The teacher's role was to label breadth, not to be imitated.

There's a deeper reason this matters, and it's about where knowledge lives. Prompting and prompt-optimization can only reorganize what a model already contains — they can't inject knowledge the training data never supplied Can prompt optimization teach models knowledge they lack?, and models routinely ignore in-context information when their parametric priors are strong Why do language models ignore information in their context?. That's exactly why the asymmetry has to be built into the *training data* rather than handled at inference: the gap you don't bake into the data is a gap you can't prompt your way across later. And there are gentler ways to move the distribution — proxy-tuning at decoding time shifts behavior while leaving base weights (and stored knowledge) intact, where direct fine-tuning corrupts lower-layer knowledge Can decoding-time tuning preserve knowledge better than weight fine-tuning? — which is a useful lever when you want the teacher's signal to touch reasoning and style without overwriting what the student already knows.

The thing you might not have expected to learn: 'preserving the information gap' and 'making the student match the teacher' are opposite goals. The corpus suggests the best teacher-student data keeps the asymmetry as a *source of correction* while refusing to transfer the teacher's certainty — wide labeled coverage, student-side filtering, and distributional nudges rather than wholesale imitation.

Sources 7 notes

Why does teacher-student information asymmetry enable learning signals?

Social meta-learning requires information asymmetry—the teacher's access to correct answers or verifier output—to generate meaningful corrective signals. Without this asymmetry, teacher and student share identical uncertainty, making pedagogical correction impossible.

Does richer teacher context hurt student generalization?

Teachers conditioned on correct answers and verifier output produce confident, concise traces that students inherit. This style suppresses uncertainty expression, optimizing in-domain performance while degrading generalization to out-of-distribution problems that require epistemic caution.

Does teacher-refined data always improve student model performance?

Teacher-refined data degrades performance when it exceeds the student's learning frontier, even if objectively higher quality. Students should filter refinements using their own statistical profile to retain only compatible improvements.

Can smaller models outperform their LLM teachers with enough data?

Walmart's student cross-encoders outperformed their LLM teachers when trained on sufficiently large augmented datasets of teacher-labeled queries. The student's broader input distribution exposure, smoothed by teacher predictions, enabled better generalization than the teacher achieved.

Can prompt optimization teach models knowledge they lack?

Prompting works entirely within a model's pre-existing training distribution and cannot supply domain knowledge absent from training data. This creates a hard ceiling: no prompt strategy can compensate for missing foundational knowledge, only reorganize what already exists.

Show all 7 sources

Why do language models ignore information in their context?

Research demonstrates that LMs generate outputs inconsistent with their context because parametric knowledge from training dominates over in-context information. Textual prompting alone cannot override strong priors; causal intervention in representations is required.

Can decoding-time tuning preserve knowledge better than weight fine-tuning?

Proxy-tuning closes 88-91% of the alignment gap while surpassing direct fine-tuning on knowledge tasks by leaving base model weights untouched. Direct fine-tuning corrupts knowledge storage in lower layers, whereas proxy-tuning applies distributional shifts that primarily affect reasoning and style.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?2.42 match · arxiv ↗
AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts2.42 match · arxiv ↗
Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining2.42 match · arxiv ↗
Learning To Retrieve Prompts for In-Context Learning1.68 match · arxiv ↗
Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning1.67 match · arxiv ↗
Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning1.60 match · arxiv ↗
AI Meets the Classroom: When Does ChatGPT Harm Learning?1.53 match · arxiv ↗
How new data permeates LLM knowledge and how to dilute it0.86 match · arxiv ↗

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a capability analyst. The question: How should training data be constructed to preserve teacher-student information gaps as a usable learning signal—and has this regime shifted since early 2023?

What a curated library found — and when (dated claims, not current truth):
Library findings span 2023–2026, with heavy clustering in 2025–2026:
• Teacher-privileged knowledge (correct answers, verifier output) produces confident, short traces that students copy wholesale, suppressing epistemic uncertainty and degrading OOD generalization (~2024–2025).
• Student-side filtering of teacher-refined data outperforms wholesale distillation; students absorb only refinements compatible with their statistical profile (~2024–2025).
• BERT cross-encoders beat LLM teachers when trained on large augmented sets of teacher labels, because teachers label breadth the teacher itself never generalized over (~2024).
• Prompt optimization and in-context priming cannot inject knowledge absent from training data; they only activate pre-existing parametric knowledge (~2024–2025).
• Proxy-tuning at decoding time preserves parametric knowledge better than direct fine-tuning; distributional nudges outperform imitation (~2025).
• Post-training (RL, consistency training) can amplify or transmit hidden behavioral traits from pretraining, including sycophancy and jailbreak susceptibility (~2025–2026).

Anchor papers (verify; mind their dates):
- arXiv:2404.02060 (2024-04): Long-context LLMs struggle with long in-context learning.
- arXiv:2504.09522 (2025-04): How new data permeates LLM knowledge and dilution strategies.
- arXiv:2602.16488 (2026-02): Social meta-learning and language feedback.
- arXiv:2603.24472 (2026-03): Why self-distillation sometimes degrades reasoning.

Your task:
(1) RE-TEST EACH CONSTRAINT. For the five findings above, probe whether newer scaling, RL post-training orchestration (reward modeling, iterative refinement), knowledge-injection tooling (retrieval, fusion), or multi-stage training pipelines have RELAXED or OVERTURNED the claim that student filtering and proxy-tuning beat wholesale imitation. Separate the durable question (how to signal asymmetry without transferring overconfidence) from the perishable limitation (that direct distillation always fails). Cite what resolved it, plainly say where constraints still hold.
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months—especially any showing that post-training alignment or iterative student feedback can *preserve* teacher-student gaps better than the library claims, or that wholesale imitation under certain curricula or reward structures does work.
(3) Propose 2 research questions that ASSUME the regime may have shifted: (a) Under multi-stage RL + KL-regularization against a pretrained base, does student-side filtering remain necessary, or does the regularizer preserve epistemic humility automatically? (b) Does synthetic data from forward-chaining or tree-search teachers (who express uncertainty during generation) avoid the overconfidence trap without explicit filtering?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

The knowledge gap between teacher and student AI is the whole point — training data that erases it teaches nothing.

Related lines of inquiry

Sources 7 notes

Papers this line draws on 8