Can lightweight adapters replace millions of personalized models?
Explores whether PEFT adapters can serve as persistent behavioral state that makes one shared base model function as millions of personalized models, and what scaling conditions make this possible.
The standard story about LoRA is economic: it is a cheaper substitute for full fine-tuning. This paper proposes a different role — PEFT as persistent local state layered on a strong shared base. The base supplies general competence; the adapter carries the learned consequences of repeated experience with one user: preferences, skills, tool habits, memory-like updates. The provocative phrasing "million personal models of trillion parameters" is explicitly not millions of owned checkpoints; it is a few trillion-scale bases plus millions of lightweight adapters serving as durable behavioral deltas. The thesis only holds if three scaling axes reinforce at once — Scale Up (a stronger base makes small updates more useful), Scale Down (how small the adaptive state can get while still learning reliably), and Scale Out (turning repeated updates into served populations). Remove any axis and it collapses.
The framing matters because the vault's personalization thread keeps hitting the wall this paper names: prompts, retrieval, and profiles "help but are not enough" because they do not persist and reshape future behavior. Why does chain-of-thought reasoning fail for personalization? is the cautionary detail — naive personalization fine-tuning destroys generalist capability, which is exactly why the adapter-as-bounded-state framing (not replacing the base, not storing the whole person) is the safer design. Can models dynamically activate expert skills at inference time? is the composability precedent the population vision needs, and it warns that LoRA adapters interfere when composed where orthogonal SVF vectors do not — a real obstacle to "compose at population scale."
The Scale Up axis quietly rests on How should finetuning scale with model and data size?: if fine-tuning gains track base-model scale, then bigger shared bases do make each tiny adapter more useful. The honest doubt is whether per-user adapters drift, stale, or leak across the population once served at scale — persistence is a liability as much as a feature, and the paper's bounded framing ("does not store the whole person, does not replace retrieval") reads partly as a hedge against that.
Inquiring lines that use this note as a source 8
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- What happens when different harnesses project the same model?
- How do orthogonal adapter vectors avoid interference at scale?
- Can per-user adapters remain consistent without drifting or leaking?
- Does base model strength determine adapter usefulness across users?
- How do aligned LoRA adapters compose through parameter-space arithmetic?
- Does parameter composition work when adapter alignment is imperfect?
- Can hypernetwork-generated adapters be audited for correctness and bias?
- What cognitive burdens should move from model parameters into harness infrastructure?
Related concepts in this collection 3
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
How should finetuning scale with model and data size?
What scaling laws govern finetuning performance across model size, pretraining data, and finetuning data? Understanding these relationships could guide resource allocation in real-world tuning scenarios.
grounds (why a stronger base makes small adapters more useful — the Scale Up axis)
-
Can models dynamically activate expert skills at inference time?
Can language models efficiently discover and compose task-specific capabilities on the fly without modifying base weights? This explores whether test-time adaptation through expert vector composition outperforms fixed fine-tuning approaches.
extends (composability the population vision needs; LoRA-interference caveat)
-
Why does chain-of-thought reasoning fail for personalization?
Standard reasoning traces produce logically sound but personally irrelevant answers. This explores why generic thinking doesn't anchor to user preferences and what might fix it.
grounds (why adapters must be bounded state, not capability-replacing fine-tuning)
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters
- ReFT: Representation Finetuning for Language Models
- Context-PEFT: Efficient Multi-Modal, Multi-Task Fine-Tuning
- PsychAdapter: Adapting LLM Transformers to Reflect Traits, Personality and Mental Health
- Real-Time Procedural Learning From Experience for AI Agents
- PRIME: Large Language Model Personalization with Cognitive Memory and Thought Processes
- Transformer2: Self-adaptive LLMs
- AutoGLM: Autonomous Foundation Agents for GUIs
Original note title
PEFT reframed as persistent local state turns one shared base model into millions of personal models — adapters carry the consequences of experience the base cannot