TOPIC

Self-Refinement and Self-Consistency

15 synthesis notes · 94 source papers
View as

When should an agent actually stop and deliberate?

How can models detect when deliberation over action choices is genuinely needed versus wasteful? This matters because unbounded action spaces make universal deliberation intractable, yet skipping it entirely risks missing critical errors.

Explore related Read →

Can language models improve themselves without any external training data?

Explores whether two language models playing against each other—one generating questions, one solving them—can create a self-improving loop. Matters because it would eliminate dependence on human-labeled datasets.

Explore related Read →

Can crowdsourced votes reliably rank language models?

Explores whether large-scale human preference voting from casual users produces valid model rankings comparable to expert judgment, and what makes such crowdsourced evaluation trustworthy at scale.

Explore related Read →

Do all AI skills improve equally as models scale?

Different evaluation skills show strikingly different scaling patterns. Understanding where skills saturate has immediate implications for model deployment and capability requirements across domains.

Explore related Read →

Can model confidence work as a reward signal for reasoning?

Explores whether using a language model's own confidence scores as training rewards can simultaneously improve reasoning accuracy and restore calibration that standard RLHF damages.

Explore related Read →

Can models improve themselves on tasks without verifiable answers?

Most self-improvement methods require verifiable correctness signals like math or code. Can models improve on open-ended instruction tasks where right answers aren't automatically checkable? And what minimal training is needed to unlock this?

Explore related Read →

Do retrieval models actually follow natural language instructions?

Most IR systems ignore instructions that define relevance, despite using LLM backbones. This raises questions about whether retrievers can adapt to nuanced user-specified information needs in practice.

Explore related Read →

Does self-consistency reliably reward correct answers during training?

Self-consistency initially correlates with correctness, but as models train on this signal, do they eventually learn to maximize consistency itself rather than accuracy? When does this proxy reward stop working?

Explore related Read →

Does self-generated training data improve model learning?

Can models learn more effectively from training data they generate themselves rather than data created by external sources? This explores whether a learner's own restructuring process produces better learning outcomes.

Explore related Read →

What limits how much models can improve themselves?

Explores whether self-improvement has fundamental boundaries set by how well models can verify versus generate solutions, and what this means across different task types.

Explore related Read →

Why do self-improvement loops eventually stop improving?

Self-improvement systems often plateau because the evaluator that judges progress stays static while the actor grows. What happens when judges don't improve alongside learners?

Explore related Read →

Why does self-correction training on offline data fail?

Can language models learn to correct their own mistakes through supervised training on correction examples? This explores whether distribution mismatch and behavior collapse prevent self-correction from emerging.

Explore related Read →

Can models learn to ask better clarifying questions through self-improvement?

This explores whether question-asking is a trainable skill that improves when models are rewarded for questions that lead to better answers. It matters because asking good clarifying questions could help AI systems handle underspecified user requests.

Explore related Read →

Can models reliably improve themselves without external feedback?

Explores whether self-improvement alone can sustain progress or if structural limits—like the generation-verification gap and diversity collapse—require external anchoring to work reliably.

Explore related Read →

Can AI systems improve their own learning strategies?

Current self-improvement relies on fixed human-designed loops that break when tasks change. The question is whether agents can develop their own adaptive metacognitive processes instead of depending on human intervention.

Explore related Read →

Source papers 94

The Arxiv papers behind this sub-topic. Links may take you off-site to arxiv.org.