A Multi-facet Paradigm to Bridge Large Language Model and Recommendation
Large Language Models (LLMs) have garnered considerable attention in recommender systems. To achieve LLM-based recommendation, item indexing and generation grounding are two essential steps, bridging between recommendation items and natural language. Item indexing assigns a unique identifier to represent each item in natural language, and generation grounding grounds the generated token sequences to in-corpus items. However, previous works suffer from inherent limitations in the two steps. For item indexing, existing ID-based identifiers (e.g., numeric IDs) and description-based identifiers (e.g., titles) often compromise semantic richness or uniqueness. Moreover, generation grounding might inadvertently produce out-of-corpus identifiers. Worse still, autoregressive generation heavily relies on the initial token’s quality. To combat these issues, we propose a novel multi-facet paradigm, namely TransRec, to bridge the LLMs to recommendation. Specifically, TransRec employs multi-facet identifiers that incorporate ID, title, and attribute, achieving both distinctiveness and semantics. Additionally, we introduce a specialized data structure for TransRec to guarantee the in-corpus identifier generation and adopt substring indexing to encourage LLMs to generate from any position.
Introduction. Large Language Models (LLMs) have achieved remarkable success across diverse domains [3, 16, 25] due to their emergent competencies, including possessing rich knowledge [13], instruction following [28], and in-context learning [22]. Recently, there has been a notable surge in exploring the benefits of adapting LLMs to recommendations. In particular, LLMs have showcased the potential in discerning nuanced item semantics [32, 36], understanding multiple user interests [9, 35], and generalizing to cross-domain and cold-start recommendations [2, 5, 11]. In light of these, the prospect of harnessing LLMs as recommender systems, i.e., LLMbased recommenders, emerges as a particularly promising avenue for further exploration. Recently, numerous investigations have identified that the key to building LLM-based recommenders lies in bridging the gap between LLMs’ pre-training and recommendation tasks. To narrow the gap, existing work usually represents recommendation data in natural language for instruction tuning on LLMs.
Discussion / Conclusion. In this work, we identified two fundamental steps of LLM-based recommenders: item indexing and generation grounding. To make full utilization of LLMs and strengthen the generalization ability of LLM-based recommenders, we posited that identifiers should pursue both distinctiveness and semantics. In addition, constrained generation and position-free generation should be supported to yield accurate recommendations. To pursue these objectives, we propose a novel transition paradigm, namely TransRec, to seamlessly bridge the language and item space. TransRec utilizes multi-facet identifiers to represent an item from ID, title, and attribute simultaneously. Besides, TransRec supports constrained and position-free generation, which guarantees high-quality generated identifiers. Furthermore, we introduce an aggregated grounding module to ground the generated identifiers to the items.