Can reinforcement learning align summarization with ranking goals?
Generic LLM summaries optimize for readability, not ranking performance. Can training summarizers with downstream relevance scores as rewards fix this misalignment and produce summaries that actually help rankers match queries?
E-commerce search rankers face a length-vs-information tradeoff. Product titles are too sparse; product descriptions are too verbose for cross-encoder rankers under latency budgets. The intuitive fix is to summarize descriptions, but generic LLM summarization optimizes for "good summary" — readability, faithfulness — not for "summary that helps the ranker". A summary the LLM judges good might omit precisely the attribute the query is asking about.
Doc2Query approaches the problem by generating queries instead of summaries, but query generation also has misaligned targets: the queries are optimized to match documents, not to feed the downstream ranker. Both approaches share the issue that the learning signal isn't connected to the ranking metric.
ReLSum's contribution is to train the summarizer with reinforcement learning where the reward is the downstream relevance score the summary produces. The model learns to keep tokens that improve recall and NDCG when fed to the ranker, regardless of whether they make a summary read well. A pet food summary becomes "Taurine, non-GMO, chicken bone broth" — three attributes the ranker can match against queries — rather than a fluent paragraph the ranker can't efficiently parse. The framework optimizes the right thing because it includes the right signal, and online metrics show user engagement improvements. The principle generalizes: any intermediate text generation feeding a downstream model should be trained against that downstream model's loss, not against a generic generation objective.
Inquiring lines that use this note as a source 14
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Can alignment techniques make LLM explainers match their recommendation behavior?
- How does ranking-aligned summarization compare to aspect-controlled generation methods?
- What tokens do RL-trained summarizers learn to keep for ranking?
- Why do pretrained LLM representations fail at task-specific relevance ranking?
- Can reranking candidate summaries improve perspective representation better than prompting?
- How do personalization errors differ from general accuracy problems in summaries?
- What makes top-N ranking loss difficult to optimize directly?
- Can hierarchical key point structures improve opinion summarization?
- What implicit knowledge about catalogs do LLMs learn from ranking signals alone?
- Why do untrained summarizers focus on topics rather than preference dimensions?
- How does soft parameter sharing in MMoE improve multi-objective ranking systems?
- Why do text-based user summaries outperform embedding vectors for pluralistic alignment?
- Are newer larger language models actually worse at faithful summarization?
- Do fluent generated summaries carry false authority over expert judgment?
Related concepts in this collection 4
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Can smaller models outperform their LLM teachers with enough data?
Explores whether student models trained on expanded teacher-generated labels can exceed teacher performance in production ranking tasks, and what data scale makes this possible.
complements: both align LLM output to a specific downstream task — distillation aligns scoring; ReLSum aligns summarization
-
Does LLM input augmentation beat direct LLM recommendation?
Can LLMs enrich item descriptions more effectively than making recommendations directly? This explores whether specialized models work better when LLMs focus on what they do best: content understanding rather than ranking.
extends: ReLSum is the RL-aligned version of summary-as-input-augmentation — generic LLM summary becomes ranking-aligned summary
-
Do comparisons help users evaluate items better than isolated descriptions?
Can framing product evaluations relationally—by comparing to other items—ground assessment in user reasoning better than absolute descriptions? This matters because recommendation explanations often ask users to do comparison work mentally.
complements: aspect-controlled and ranking-aligned generation are alternative LLM-output-shaping methods for downstream recommendation
-
Can we distill LLM knowledge into graphs for real-time recommendations?
E-commerce needs sub-millisecond recommendations, but LLMs are too slow. Can we extract LLM insights offline into a knowledge graph that serves requests in production without sacrificing quality or explainability?
complements: same offline-LLM-for-online-recommendation pattern — KG distillation vs ranking-aligned summarization
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Generating Query-Relevant Document Summaries via Reinforcement Learning
- Reranking-based Generation for Unbiased Perspective Summarization
- Learning Pluralistic User Preferences through Reinforcement Learning Fine-tuned Summaries
- Guiding Large Language Models via Directional Stimulus Prompting
- Collaborative Rational Speech Act: Pragmatic Reasoning for Multi-Turn Dialog
- RichRAG: Crafting Rich Responses for Multi-faceted Queries in Retrieval-Augmented Generation
- Large Language Models are Zero-Shot Rankers for Recommender Systems
- RewardBench: Evaluating Reward Models for Language Modeling
Original note title
RL-trained query-relevant summaries align summarization with downstream ranking — fixing the misaligned-target problem of generic LLM summarization