INQUIRING LINE

Training, RL, and Test-Time Scaling · Model Architecture and Internals · Reasoning, Retrieval, and Evaluationcross-cluster

How do aligned LoRA adapters compose through parameter-space arithmetic?

This explores whether you can take several LoRA adapters — each fine-tuned on an already-aligned base — and combine them by literally adding or merging their weight deltas, rather than retraining a single multi-skill model.

This explores the idea that a LoRA adapter is a small bundle of weight changes you could add to others like terms in an equation. The corpus doesn't have a paper on adapter arithmetic in the literal 'add the vectors' sense, so the honest answer is that it speaks to the *preconditions* for composition more than the operation itself — which turns out to be the more interesting story. The foundational move is reframing an adapter as a durable behavioral delta: a portable diff that carries learned experience on top of one shared base, so that millions of lightweight adapters can stand in for millions of full models Can lightweight adapters replace millions of personalized models?. Once adapters are deltas, asking whether they add together is the natural next question.

The corpus's sharpest warning about naive arithmetic comes from work on multi-task interference. When you merge parameter changes blindly, tasks step on each other. The fix is to identify the *core* region each task actually depends on, freeze those, and only geometrically merge the *non-core* parameters — which consistently beats standard multi-task fine-tuning Can isolating task-specific parameters prevent multi-task fine-tuning interference?. That's parameter-space arithmetic done with surgical guardrails: composition works, but only over the subspaces where the adapters aren't both making load-bearing claims. It implies adding two raw LoRA deltas isn't free — overlap is where the damage happens.

Why might LoRA deltas compose cleanly at all? Because what a small adapter often learns is narrow and separable. A 1.5B model with LoRA-only post-training matched much larger RL-tuned models on reasoning, suggesting the adapter taught *output format and organization* rather than new facts Can small models reason well by just learning output format?. If an adapter encodes a style or a skill rather than rewriting the model's knowledge, two such adapters are far less likely to collide — the alignment and the knowledge live elsewhere, untouched.

That 'leave the base untouched' instinct points to a rival to weight arithmetic entirely: composing *at decoding time* instead of in parameter space. Proxy-tuning steers a base model by applying the distributional shift from a tuned model at generation, closing most of the alignment gap while preserving knowledge that direct fine-tuning corrupts in the lower layers Can decoding-time tuning preserve knowledge better than weight fine-tuning?. So there are two composition strategies in tension — merge the weights (cheap to serve, risks interference) versus blend the outputs (protects knowledge, costs inference). The thing you didn't know you wanted to know: 'aligned adapters' compose best precisely *because* alignment lives in regions you shouldn't be doing arithmetic on, which is exactly why isolation and decoding-time methods exist.

Sources 4 notes

Can lightweight adapters replace millions of personalized models?

PEFT adapters function as durable behavioral deltas carrying learned user experience, enabling a single strong base plus millions of lightweight adapters to replace millions of full models—but only when scale-up, scale-down, and scale-out reinforce simultaneously.

Can isolating task-specific parameters prevent multi-task fine-tuning interference?

Research shows that identifying core parameter regions per task, clustering overlapping tasks, and freezing core parameters while geometrically merging non-core parameters consistently outperforms standard multi-task fine-tuning. Temporal task scheduling alone proves insufficient without explicit structural parameter isolation.

Can small models reason well by just learning output format?

A 1.5B parameter model with LoRA-only post-training matched larger full-parameter RL models on reasoning tasks, suggesting RL teaches output format organization rather than new factual knowledge. This efficiency indicates reasoning and knowledge storage are separable capabilities.

Can decoding-time tuning preserve knowledge better than weight fine-tuning?

Proxy-tuning closes 88-91% of the alignment gap while surpassing direct fine-tuning on knowledge tasks by leaving base model weights untouched. Direct fine-tuning corrupts knowledge storage in lower layers, whereas proxy-tuning applies distributional shifts that primarily affect reasoning and style.

How do aligned LoRA adapters compose through parameter-space arithmetic?

Sources 4 notes

Next inquiring lines