SYNTHESIS NOTE
Recommender Systems

Can reinforcement learning align summarization with ranking goals?

Generic LLM summaries optimize for readability, not ranking performance. Can training summarizers with downstream relevance scores as rewards fix this misalignment and produce summaries that actually help rankers match queries?

Synthesis note · 2026-05-03 · sourced from Recommenders Architectures
What breaks when specialized AI models reach real users?

E-commerce search rankers face a length-vs-information tradeoff. Product titles are too sparse; product descriptions are too verbose for cross-encoder rankers under latency budgets. The intuitive fix is to summarize descriptions, but generic LLM summarization optimizes for "good summary" — readability, faithfulness — not for "summary that helps the ranker". A summary the LLM judges good might omit precisely the attribute the query is asking about.

Doc2Query approaches the problem by generating queries instead of summaries, but query generation also has misaligned targets: the queries are optimized to match documents, not to feed the downstream ranker. Both approaches share the issue that the learning signal isn't connected to the ranking metric.

ReLSum's contribution is to train the summarizer with reinforcement learning where the reward is the downstream relevance score the summary produces. The model learns to keep tokens that improve recall and NDCG when fed to the ranker, regardless of whether they make a summary read well. A pet food summary becomes "Taurine, non-GMO, chicken bone broth" — three attributes the ranker can match against queries — rather than a fluent paragraph the ranker can't efficiently parse. The framework optimizes the right thing because it includes the right signal, and online metrics show user engagement improvements. The principle generalizes: any intermediate text generation feeding a downstream model should be trained against that downstream model's loss, not against a generic generation objective.

Inquiring lines that use this note as a source 14

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 4

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
14 direct connections · 93 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

RL-trained query-relevant summaries align summarization with downstream ranking — fixing the misaligned-target problem of generic LLM summarization