TOPIC

Training Data

9 synthesis notes · 11 source papers
View as

Can agents learn from their own actions without external rewards?

Explores whether future states produced by an agent's own decisions can serve as supervision signals, bridging the gap between passive imitation learning and reward-dependent reinforcement learning.

Explore related Read →

What can a bounded observer actually learn from data?

Classical information measures treat all high-entropy content equally, but computationally bounded learners can only extract certain types of structure. What distinguishes learnable regularity from random noise that bounded agents face?

Explore related Read →

Can agents learn beyond what their training data shows?

Explores whether supervised fine-tuning on expert demonstrations creates a hard ceiling on agent competence, or whether agents can generalize to scenarios their curators never captured.

Explore related Read →

Can reconstructing expert thinking improve reasoning transfer?

Expert texts show only the final result of complex thinking. Can we reverse-engineer those hidden thought processes and use them to train models that reason better across different domains?

Explore related Read →

Can synthetic data replace seed examples in task generation?

Can models generate high-quality synthetic data for novel tasks without relying on existing input-output exemplars? This matters because many specialized domains lack training examples to work from.

Explore related Read →

How do quality, diversity, and complexity affect synthetic data differently?

When training models on synthetic data, do quality, diversity, and complexity each play distinct roles in how well models generalize? Understanding their separate effects could explain why current optimization strategies fail.

Explore related Read →

Can we generate synthetic data without any seed examples?

Existing synthetic data methods rely on seed examples from the target distribution, which is impractical for novel domains. Can taxonomic decomposition eliminate this dependence while maintaining controllable coverage?

Explore related Read →

Why do Shannon and Kolmogorov measures fail to value data?

Shannon information and Kolmogorov complexity assume unlimited computational capacity. But do these classical measures actually capture what bounded learners can extract from real data?

Explore related Read →

Why do language models need so much more text than humans?

Language models train on the surface of written text, but humans learn by inferring the underlying thoughts behind what they read. Does this explain why models need vastly more data to reach human-level understanding?

Explore related Read →

Source papers 11

The Arxiv papers behind this sub-topic. Links may take you off-site to arxiv.org.