On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters

Paper · arXiv 2606.02437 · Published June 1, 2026
Training and Fine-Tuning

Parameter-efficient fine-tuning (PEFT) is usually evaluated as a cheaper alternative to full fine-tuning. This paper studies a broader possibility: whether small trainable adapters can serve as persistent local state on top of strong shared foundation models. In this view, a base model supplies common competence, while adapters may carry part of an instance-specific behavioral state, such as preferences, skills, tool habits, or memory-like updates. This framing is deliberately bounded: PEFT does not store the whole person or replace retrieval, but it may provide a compact unit of adaptive state that can be trained, evaluated, served, and composed at population scale. We study this possibility through three coupled scaling problems that must reinforce one another. Scale Up asks whether a stronger shared base model makes small local updates more useful. We study large-prior LoRA reinforcement learning, routing-aware correction, and training–serving consistency at trillion-scale MoE. ScaleDown asks how small the local adaptive state can become while still learning reliably. We analyze rank regimes, low-rank instability, RL-native initialization, hyperparameter transfer, and memory-oriented adapter designs such as δ-mem.

Introduction. Frontier models can now write production code, operate tools, and reason across long contexts (OpenAI, 2025; GLM-5 Team, 2026; Kimi Team, 2025; Qwen Team, 2025; Anthropic, 2025a). Agentic systems built on these models resolve real-world software engineering tasks autonomously (Anthropic, 2025b; Wang et al., 2024; Jimenez et al., 2024). But a capable assistant is not automatically a personal one. It may answer more questions and call more tools, and still fail to preserve continuity with one person over time. Long context, retrieval (Lewis et al., 2021), prompts (Ji, 2025), and user profiles (Li et al., 2024) all help, but are not enough by themselves. A personal model needs state that can persist, adapt, and shape future behavior (Yao, 2025; Silver and Sutton, 2025). This paper argues that parameter-efficient fine-tuning (PEFT), especially LoRA (Hu et al., 2021), is a prac- We organize the technical path around three coupled scaling problems. Scale Up asks how to make strong shared base models repeatedly adaptable.

Discussion / Conclusion. The phrase “million personal models of trillion parameters” should not be read as a claim that each user owns and trains a separate trillion-parameter checkpoint. The intended architecture is different. A small number of strong trillion-scale base models provide shared capability, while millions of lightweight adapters provide persistent local adaptive state. The base model carries general reasoning, world knowledge, language competence, and tool-use priors. The adapter carries part of the learned consequences of repeated experience, such as memories, preferences, skills, and policies. This architecture depends on all three scaling axes at once. Scale Up makes the shared base model worth adapting. Scale Down makes each update cheap and stable enough to repeat. Scale Out turns repeated updates into persistent populations. Removing any axis breaks the thesis. Weak base models limit what adapters can learn. Expensive adapters prevent continuous adaptation.