How does ranking-aligned summarization compare to aspect-controlled generation methods?
This explores two ways of steering a summarizer toward a goal rather than toward fluent prose: one trains it to serve a downstream ranking metric, the other controls which aspects or perspectives the summary must cover — and asks what the corpus says about how those strategies differ.
This explores two ways of steering a summarizer toward a goal rather than toward readable prose — training it to feed a ranking system versus controlling which aspects or viewpoints it must cover. The interesting thing the corpus reveals is that both approaches start from the same move: they stop treating summarization as 'write a nice paragraph' and start treating it as optimization toward an external target. They just pick different targets.
Ranking-aligned summarization, as in ReLSum Can reinforcement learning align summarization with ranking goals?, uses the actual relevance score from a downstream search system as a reinforcement-learning reward. The summarizer learns that fluency is beside the point — what wins is dense, attribute-packed text that the ranker can act on. The summary is judged not by how it reads but by whether it improves recall and NDCG. The target is a single metric, and the model is sculpted to maximize it.
Aspect-controlled generation, by contrast, optimizes for coverage and balance rather than a scalar score. MODS Can tailoring queries per document improve debatable summarization? reframes summarization as a retrieval-and-planning problem: instead of one query applied uniformly, each source document gets its own specialized 'speaker' and a tailored query, which lifts perspective coverage by 38–58%. The goal isn't to please a ranker — it's to make sure no viewpoint gets flattened out. Where ReLSum compresses toward what's useful, MODS deliberately spreads to capture what's diverse.
What connects them is a deeper architectural idea the corpus states plainly elsewhere: separating query planning from answer synthesis reduces interference and improves results on hard, multi-hop work Do hierarchical retrieval architectures outperform flat ones on complex queries?. ReLSum bakes the 'what matters' signal into the reward; MODS bakes it into per-document query planning. Both are betting that the summarizer shouldn't decide on its own what to keep — that judgment should come from an explicit external structure, whether a reward signal or a planning layer.
So the comparison isn't really 'which is better' — they answer different questions. If you have a measurable downstream task (a search ranker, a click signal), ranking-alignment lets the metric teach the model directly. If you have a contested or many-sided topic where the risk is erasing a perspective, aspect control protects breadth that no single relevance score would reward. The thing worth noticing: a relevance-optimized summarizer would likely fail MODS's balance test, because the highest-scoring summary and the most representative summary are not the same object.
Sources 3 notes
ReLSum trains summarizers using downstream relevance scores as RL rewards, producing dense, attribute-focused summaries instead of fluent prose. This alignment to the actual ranking metric improves recall, NDCG, and user engagement in production e-commerce search.
MODS achieves 38–58% improvement in topic coverage and balance by assigning each document a specialized speaker LLM that receives tailored queries, rather than applying uniform queries across all documents. This reframes summarization as a retrieval problem solved through source-aware query planning.
Separating query planning from answer synthesis into distinct components reduces interference and improves multi-hop query performance. This architectural principle mirrors documented benefits of separating planning from execution in agent design.