Self-Improving Transformers Overcome Easy-to-Hard and Length Generalization Challenges

Paper · arXiv 2502.01612 · Published February 3, 2025
LLM ArchitectureEvolutionary MethodsSelf-Refinement and Self-Consistency

Large language models often struggle with length generalization and solving complex problem instances beyond their training distribution. We present a selfimprovement approach where models iteratively generate and learn from their own solutions, progressively tackling harder problems while maintaining a standard transformer architecture. Across diverse tasks including arithmetic, string manipulation, and maze solving, self-improving enables models to solve problems far beyond their initial training distribution—for instance, generalizing from 10-digit to 100-digit addition without apparent saturation. We observe that in some cases filtering for correct self-generated examples leads to exponential improvements in out-of-distribution performance across training rounds. Additionally, starting from pretrained models significantly accelerates this self-improvement process for several tasks. Our results demonstrate how controlled weak-to-strong curricula can systematically teach a model logical extrapolation without any changes to the positional embeddings, or the model architecture.

Introduction. Despite the remarkable success of transformer-based language models (Vaswani et al., 2017) across a wide range of tasks, these models exhibit significant limitations in length generalization—the ability to extrapolate to longer sequences than those seen during training. Even in simple algorithmic tasks such as arithmetic, standard transformer models trained with autoregressive objectives struggle to generalize to longer problem instances (Dubois et al., 2019; Hupkes et al., 2020; Newman et al., 2020; Anil et al., 2022). To address this, prior work has explored various approaches, including changes to positional embeddings (Ruoss et al., 2023; Li et al., 2023; McLeish et al., 2024; Kazemnejad et al., 2024; Sabbaghi et al., 2024; Cho et al., 2024; Zhou et al., 2024), architectural modifications (Fan et al., 2024; Duan et al., 2023), and data format changes such as index hinting (Zhou et al., 2023, 2024).

Discussion / Conclusion. A key consideration in self-improvement is defining and quantifying task difficulty. In real-world domains such as mathematics and natural language tasks, formalizing "difficulty" remains an open question. Our experiments demonstrate that careful difficulty scheduling is crucial for effective self-improvement. However, we also find that models exhibit some robustness to difficulty slack—especially when trained on harder tasks (Section 7.1) and when leveraging pretrained models (Section 7.3). Another fundamental assumption in our framework is that models can handle slightly harder tasks than those seen in training. While this holds in many structured tasks, there are cases where such generalization is inherently difficult. For example, training on raw multiplication problems without intermediate steps leads to poor OOD generalization, making self-improvement infeasible. However, we show that breaking down tasks into intermediate steps enables slight OOD generalization, which can be leveraged for self-improvement(Section 6.2). This highlights the importance of designing task representations that align with a model’s inherent generalization capabilities.