Does LLM input augmentation beat direct LLM recommendation?
Can LLMs enrich item descriptions more effectively than making recommendations directly? This explores whether specialized models work better when LLMs focus on what they do best: content understanding rather than ranking.
Two paradigms exist for incorporating LLMs into recommender systems. The first uses LLMs as recommenders directly: build a prompt with task description, user profile, item attributes, and user-item history, ask the LLM to predict interaction probability. The second uses LLMs as input augmenters: use them to enrich item descriptions, then feed the enriched descriptions to a conventional recommender model.
LLM-Rec investigates the second paradigm with three prompt types. P1 instructs the LLM to paraphrase the original content, preserving information without adding new details. P2 instructs the LLM to summarize content with tags, generating a more concise overview. P3 instructs the LLM to deduce content characteristics and provide categorical responses at a coarser granularity than the original.
Combining the original description with the augmented texts from these prompts improves recommendation performance over either the original alone or the LLM-as-recommender approach. The mechanism: each prompt extracts a different aspect of the item that the LLM "knows" from pretraining (paraphrase preserves content but normalizes phrasing; tags compress to discriminative attributes; categories provide hierarchy). The augmented input enriches the recommender's representation without subjecting it to the LLM's recommendation-task biases.
The methodological lesson is to ask which problems an LLM is good at versus what you need for your task. LLMs are excellent at content understanding (paraphrase, summarization, categorization). They are not specialized recommenders. Letting the LLM do what it's good at — generate enriched textual features — and letting a specialized model do recommendation often beats trying to make the LLM do everything.
Inquiring lines that use this note as a source 17
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Why do LLM recommenders drop 60 percent recall when missing collaborative signals?
- Which LLM recommender paradigm actually performs best empirically?
- Why do LLM explanations cite similarity and diversity more as options increase?
- How do cost-efficient LLM models compare to high-performance ones in recommendation?
- How does collaborative filtering integrate into LLM-based recommendation systems?
- Why is popularity bias harder to fix in LLM recommenders than in collaborative filtering?
- Can embedding-based integration preserve both LLM text strength and collaborative filtering signal?
- Why do LLM recommenders underperform item-only collaborative filtering baselines?
- How does pretraining corpus popularity bias affect LLM recommendation behavior?
- Which deployment domains favor LLM recommenders over traditional collaborative approaches?
- Can LLMs reliably assess the quality of ideas they generate?
- Does input augmentation outperform direct language-based recommendation systems?
- Why doesn't catalog synchronization matter for LLMs trained on live recommender feedback?
- What implicit knowledge about catalogs do LLMs learn from ranking signals alone?
- How does this differ from using LLMs as the policy itself?
- Can LLMs recommend items without seeing the product catalog?
- Why do LLMs rely on content knowledge instead of collaborative signals?
Related concepts in this collection 4
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
How should language models integrate into recommender systems?
When building recommendation systems with LLMs, should you use them as feature encoders, token generators, or direct recommenders? The choice affects efficiency, bias, and compatibility with existing pipelines.
tension with: LLM-Rec evidence shows direct-LLM-as-recommender is the weakest paradigm; input-augmentation outside the taxonomy beats it
-
Do prompt techniques work the same across all LLM tiers?
Do chain-of-thought and rephrasing prompts help or hurt recommendation tasks equally across cost-efficient and high-performance models? Understanding tier-dependent effects could optimize prompt selection.
complements: rephrasing-as-input-augmentation is exactly the cheap-model-friendly prompt this benchmark identifies
-
Can retrieval enhancement fix explainable recommendations for sparse users?
When users have few historical interactions, embedded recommendation models struggle to generate personalized explanations. Can augmenting sparse histories with retrieved relevant reviews—selected by aspect—overcome this fundamental data limitation?
complements: aspect-augmentation and content-augmentation are parallel — both use external generation to enrich sparse signal before recommendation
-
Can LLMs gain collaborative filtering strength without losing text understanding?
LLM recommenders excel at cold-start through text semantics but struggle with warm interactions where collaborative patterns matter most. Can external collaborative models be integrated into LLM reasoning to close this gap?
complements: CoLLM brings CF-into-LLM; LLM-Rec brings LLM-text-into-traditional-recommender — opposite directions of the same hybrid intent
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- LLM-Rec: Personalized Recommendation via Prompting Large Language Models
- Prompting Large Language Models for Recommender Systems: A Comprehensive Framework and Empirical Analysis
- Exploring the Impact of Large Language Models on Recommender Systems: An Extensive Review
- CoLLM: Integrating Collaborative Embeddings into Large Language Models for Recommendation
- A Multi-facet Paradigm to Bridge Large Language Model and Recommendation
- Understanding the Role of User Profile in the Personalization of Large Language Models
- Large Language Models as Conversational Movie Recommenders: A User Study
- Large Language Models as Zero-Shot Conversational Recommenders
Original note title
LLM-Rec input augmentation outperforms LLM-as-recommender — content prompting for paraphrase summary and category labels enriches representation