Can input-only training encode user preferences without task-specific labels?
This explores whether a model can pick up what a user wants by learning from raw inputs alone — observed behavior, unlabeled streams — rather than from explicitly labeled preference or task data, and what that buys you versus loses.
This explores whether a model can pick up user preferences from raw inputs alone, without someone hand-labeling "this is what the user wants." The corpus has a clear poster child for yes: UI-JEPA Can unlabeled UI video teach models what users intend? applies JEPA-style predictive masking to plain screen recordings — the model learns to predict masked chunks of UI activity, and that self-supervised objective alone produces representations rich enough for a decoder to read off user intent with only minimal labeled examples. The trick is that the supervision comes from the structure of the input itself (what comes next on screen), not from task annotations. The trade it names is the interesting part: you swap the bottleneck of scarce labeled video for abundant unlabeled streams.
A second route reaches the same place by watching instead of predicting. M3-Agent Can agents learn preferences by watching rather than asking? infers and acts on preferences from continuous multimodal observation — no one asks the user anything, no preference dataset is collected. The preference signal is reconstructed from accumulated observation, organized into an entity-centric memory graph. So "input-only" splits into two flavors the corpus distinguishes: learning from the predictive structure of inputs (UI-JEPA) versus learning from the accumulated record of them (M3-Agent).
What's worth knowing is how sharply this contrasts with the label-hungry methods sitting right next to it. PReF Can user preferences be learned from just ten questions? still needs explicit preference comparisons — it just makes them cheap, inferring a personalized reward from ten adaptive questions. PLUS Can text summaries beat embeddings for personalized reward models? trains on preference data to produce text summaries. These work, but they assume someone provides preference labels somewhere in the loop. The input-only methods are betting they can skip that entirely, and the early evidence says you can get surprisingly far — but probably not all the way to fine-grained reward alignment without some labeled signal to anchor it.
There's also a quieter finding about what *form* the encoded preference should take. PRIME Does abstract preference knowledge outperform specific interaction recall? shows that abstracted preference knowledge (summaries, parametric encodings) consistently beats just retrieving raw past interactions. This matters for input-only training: simply hoarding inputs isn't enough — the win comes from compressing them into semantic abstractions, which is exactly what UI-JEPA's learned representations and M3-Agent's semantic graph are doing. Raw episodic input is the material; abstraction is the value.
One caution the corpus raises, almost as a warning label. The same property that makes input-only learning powerful — that statistical regularities in inputs carry signal beyond their literal content — is the property behind subliminal trait transmission Can language models transmit hidden behavioral traits through unrelated data?, where behavioral traits leak through data with no semantic relationship to the trait at all. If preferences can be encoded from input statistics without labels, so can things you didn't intend to encode. Input-only training doesn't get to choose which patterns it absorbs, which is the flip side of not needing labels to absorb them.
Sources 6 notes
UI-JEPA applies JEPA-style predictive masking to screen recordings, learning task-aware temporal representations that an LLM decoder can use to infer intent with minimal paired data. This trades the bottleneck of labeled video for abundant unlabeled streams.
M3-Agent demonstrates that separating episodic events from semantic knowledge in an entity-centric graph, combined with parallel memorization and control processes, allows agents to infer and act on user preferences without asking. This architecture mirrors human cognitive systems that bind disparate information about individuals across sensory modalities.
PReF learns base reward functions from preference data, then uses active learning to select maximally informative questions that reduce coefficient uncertainty. Users can be personalized via inference-time reward alignment without weight modification.
PLUS trains summarizers and reward models jointly, learning that text-based preference summaries capture dimensions zero-shot summaries miss. These summaries transfer to GPT-4 for zero-shot personalization and remain interpretable to users.
PRIME framework shows semantic memory (preference summaries, parametric encodings) consistently beats episodic memory (retrieved past interactions) across models. Recency-based recall outperforms similarity-based retrieval, and task fine-tuning exceeds preference tuning methods.
Research demonstrates that behavioral traits propagate between models via filtered data bearing no semantic relationship to the trait. The effect is model-specific, fails across different architectures, and persists despite rigorous filtering—indicating the mechanism embeds statistical signatures rather than semantic content.