SYNTHESIS NOTE

Does preference tuning always reduce diversity the same way?

Explores whether the standard narrative that RLHF reduces model diversity holds equally across different task domains, or if the effect varies by what the domain rewards.

Synthesis note · 2026-05-18 · sourced from Evaluations

A clean finding from Evaluating the Diversity and Quality of LLM Generated Content that the standard "RLHF reduces diversity" narrative cannot accommodate: the direction of the effect depends on the domain. In programming tasks, preference tuning consistently reduces lexical and syntactic diversity while preserving semantic diversity. In open-ended creative writing, preference tuning increases lexical and syntactic diversity, including stylistic variety.

The pattern makes sense in retrospect. Code has a sharp, narrow definition of "correct" — semantically equivalent programs converge on a small set of valid syntactic forms. Preference tuning pushes models toward correctness, which in code means pushing toward a smaller surface lexicon. Creative writing has the opposite property: "good" creative writing rewards distinctive word choice, varied sentence structure, stylistic range. Preference tuning pushes models toward those rewards, which manifests as broader lexical and syntactic variety.

This breaks the assumption that diversity is a single property of the model. A model that has been preference-tuned is not "less diverse" in the absolute sense — it is differently shaped depending on what the domain rewards. For code-heavy applications, the lexical compression is a feature (consistent style) or a bug (less exploration of solution space) depending on what you want. For creative applications, the lexical expansion is a clear win.

The implication for evaluation is that benchmarks that measure diversity in a domain-agnostic way will report misleading aggregate numbers. A model that scores 60th percentile on "creative writing diversity" and 90th percentile on "code diversity" averages to a middling number that hides both ends of the actual capability distribution. Domain-stratified diversity evaluation is necessary to characterize what preference tuning has done to a model.

For builders, this dissolves part of the "should we preference-tune for creativity?" debate. The answer depends on whether the desired creativity is the convergent kind (programs that work) or the divergent kind (stories that distinguish themselves) — and on those terms, preference tuning is well-aligned with the second.

Inquiring lines that read this note 115

This note is a source for these research framings, grouped by the broader line of inquiry each explores. Scan the bold lines of inquiry; follow any specific question forward.

Does preference tuning always reduce diversity the same way?

Inquiring lines that read this note 115

Related concepts in this collection 2

Related papers in this collection 8

Search by related questions 4