INQUIRING LINE

Does parameter composition work when adapter alignment is imperfect?

This explores whether you can merge fine-tuned weights or adapters (like LoRA) into one model when the pieces don't line up cleanly — and what the corpus says about why naive composition breaks and how to rescue it.


This explores whether you can merge fine-tuned weights or adapters into a single model when those pieces weren't trained to fit together — the messy reality behind "just average the weights." The short answer from the corpus: blind composition fails through interference, but composition *does* work if you're surgical about which parameters you touch. The cleanest treatment is the finding that you can prevent multi-task interference by isolating the small core regions each task actually depends on, freezing those, and geometrically merging only the non-core parameters Can isolating task-specific parameters prevent multi-task fine-tuning interference?. The key insight is that imperfect alignment isn't fatal everywhere — it's fatal in the core. Most of a model's parameters can be safely combined; the damage comes from a minority of task-critical weights colliding. So composition works precisely when you stop pretending all parameters are equal.

There's a deeper reason alignment is so often imperfect to begin with: two models can post identical accuracy while organizing their internals in completely different, even "fractured" ways Can models be smart without organized internal structure?. If two adapters solve the same task through incompatible internal geometry, averaging their weights blends two coherent solutions into one incoherent one. That's why surface metrics can look fine right up until the merged model shatters under distribution shift — the misalignment was invisible to the benchmark.

A recurring move in the corpus is to dodge weight-space composition entirely. Proxy-tuning composes models at *decoding time* — it shifts the output distribution using the difference between a tuned and untuned small model, leaving the big base weights untouched, and closes 88–91% of the alignment gap while actually preserving knowledge better than direct fine-tuning Can decoding-time tuning preserve knowledge better than weight fine-tuning?. Representation fine-tuning makes a similar bet: intervene on frozen hidden states instead of editing weights, getting 10–50x better parameter efficiency than LoRA Can editing hidden representations beat weight updates for finetuning?. Both sidestep the alignment problem because they never try to make incompatible weight matrices coexist — they compose behavior, not parameters.

The optimistic frame is the adapter-as-state view: a single strong base plus millions of lightweight adapters, each carrying a durable behavioral delta Can lightweight adapters replace millions of personalized models?. But notice the fine print there — it only holds when scale-up, scale-down, and scale-out reinforce together. Composition isn't a free lunch; it's conditional on the adapters sharing the same anchored base. And there's a reason that anchor matters: post-training mostly *activates* capabilities already latent in the base rather than installing new ones Can careful curation replace massive alignment datasets?. Adapters that all point at the same pretrained substrate are far likelier to compose than adapters that have each dragged the model somewhere new.

So the thing you didn't know you wanted to know: "imperfect alignment" isn't one problem but a fork. If the misalignment lives in non-core parameters, geometric merging absorbs it. If it lives in the core — or in fractured internal representations two adapters don't share — weight composition quietly corrupts the model, and the better answer is to compose at inference time instead of in the weights at all.


Sources 6 notes

Can isolating task-specific parameters prevent multi-task fine-tuning interference?

Research shows that identifying core parameter regions per task, clustering overlapping tasks, and freezing core parameters while geometrically merging non-core parameters consistently outperforms standard multi-task fine-tuning. Temporal task scheduling alone proves insufficient without explicit structural parameter isolation.

Can models be smart without organized internal structure?

Models trained with SGD can contain all the linearly decodable features needed for a task while maintaining fundamentally broken internal organization. This makes them vulnerable to perturbation and distribution shift invisible to standard evaluation metrics.

Can decoding-time tuning preserve knowledge better than weight fine-tuning?

Proxy-tuning closes 88-91% of the alignment gap while surpassing direct fine-tuning on knowledge tasks by leaving base model weights untouched. Direct fine-tuning corrupts knowledge storage in lower layers, whereas proxy-tuning applies distributional shifts that primarily affect reasoning and style.

Can editing hidden representations beat weight updates for finetuning?

ReFT learns task-specific interventions on frozen model representations rather than updating weights, with LoReFT (low-rank linear subspace variant) dramatically outperforming LoRA across reasoning, instruction-following, and NLU benchmarks while using far fewer parameters.

Can lightweight adapters replace millions of personalized models?

PEFT adapters function as durable behavioral deltas carrying learned user experience, enabling a single strong base plus millions of lightweight adapters to replace millions of full models—but only when scale-up, scale-down, and scale-out reinforce simultaneously.

Can careful curation replace massive alignment datasets?

LIMA demonstrates that 1000 carefully curated examples fine-tuned on a strong pretrained model achieve competitive alignment performance with models trained on orders of magnitude more data, showing that post-training activates existing capabilities rather than building new ones.

Next inquiring lines