Does better summary writing actually increase user engagement?
When AI systems generate more informative push notifications, do users engage more? This explores whether informativeness and engagement always align in real product contexts.
LLM-generated summaries for social network push notifications were objectively more informative and customized than existing templates. They did not improve user engagement. The explanation is structural, not quality-related: a well-summarized notification body contains sufficient information that users do not need to open the notification to understand the content. The optimization target (informativeness) directly undermines the business metric (engagement/clicks).
This is an instance of Goodhart's Law operating through content quality: when you optimize for how informative a message is, you can succeed at informativeness while failing at the behavior the informativeness was supposed to drive. The information was meant to entice users to engage; instead, it satisfied their information need at the notification level.
Two compounding factors emerged from the experiments:
Voice alienation: LLM summarization transformed first-person user voice ("I'm looking for a plumber") into third-person reportage ("neighbor asks about plumbers"). This tonal shift alienated recipients by creating distance from the original social context. The content was more polished but less relational — it sounded like a news brief about a neighbor rather than a neighbor reaching out.
Optimization gap: Without a reward model specifically trained for engagement, or specific model training to tailor user preferences into content generation, in-context learning alone cannot shortcut established templates that have been iteratively refined over years. The control templates were the product of multiple iteration cycles; the LLM-generated alternatives were one-shot productions. Even when LLMs produce "better" content by linguistic quality metrics, they cannot automatically improve engagement metrics that require alignment with user behavioral patterns.
The broader pattern: LLM-generated content is best suited for rapid prototyping of new products but directly using it to improve metrics on mature products that have undergone years of A/B testing often fails. The same dynamic appeared in invitation emails — more informative, more personalized, but not more effective at driving sign-ups. Generic LLM-generated content cannot capture individual personal preferences without further training.
This connects to the alignment tax discussion: since Does preference optimization harm conversational understanding?, we see a parallel where optimizing for one communication quality (informativeness) erodes the behavioral outcome it was meant to serve (engagement). The mechanism differs — RLHF erodes grounding acts while informativeness optimization eliminates click-through motivation — but the pattern is the same: optimizing a proxy metric degrades the downstream target.
Inquiring lines that use this note as a source 7
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- How do recommender systems respond to engagement signals from AI-generated content?
- Does higher cognitive load on social media increase engagement?
- How does automated transcript analysis compare to patient self-report on engagement?
- Does the interface design itself shape how much content users will review?
- Does high knowledge density in text reduce user motivation to read more?
- Can reward models trained for engagement fix the informativeness problem?
- What distinguishes proactive information provision from proactive clarification seeking?
Related concepts in this collection 3
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Does preference optimization harm conversational understanding?
Exploring whether RLHF training that rewards confident, complete responses undermines the grounding acts—clarifications, checks, acknowledgments—that actually build shared understanding in dialogue.
parallel pattern: optimizing for one communication quality undermines the broader communicative goal
-
Can we measure reading efficiency as a quality metric?
How can we quantify whether generated text delivers novel information efficiently or wastes reader attention through redundancy? This matters because standard coherence and fluency scores miss texts that are well-written but informationally dense.
high knowledge density in summaries may be the mechanism: too much information per token eliminates the curiosity gap
-
Do language models generate more novel research ideas than experts?
Explores whether LLMs can break free from expert constraints to generate more novel research concepts. Matters because novelty is often thought to be AI's creative blind spot.
parallel dissociation: higher quality on one dimension doesn't translate to effectiveness on the actual goal dimension
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Large Language Models For Social Networks: Applications, Challenges, And Solutions
- Guiding Large Language Models via Directional Stimulus Prompting
- Reranking-based Generation for Unbiased Perspective Summarization
- Summaries, Highlights, and Action items: Design, implementation and evaluation of an LLM-powered meeting recap system
- Generating Query-Relevant Document Summaries via Reinforcement Learning
- LLMs as Architects and Critics for Multi-Source Opinion Summarization
- Generalization Bias in Large Language Model Summarization of Scientific Research
- Learning Pluralistic User Preferences through Reinforcement Learning Fine-tuned Summaries
Original note title
more informative AI-generated content paradoxically reduces user engagement because informational sufficiency eliminates the need to click through