SYNTHESIS NOTE
Recommender Systems

Can LLMs explain recommenders by mimicking their internal states?

Can training language models to align with both a recommender's outputs and its internal embeddings produce explanations that are both faithful and human-readable? This explores whether dual-access interpretation solves the fundamental tension between behavioral accuracy and interpretability.

Synthesis note · 2026-05-03 · sourced from Recommenders LLMs
What breaks when specialized AI models reach real users? How do people build trust with conversational AI?

Conventional explainability for recommenders trains a separate surrogate model to mimic the target's predictions and reads off feature importance from the surrogate. This works at a behavioral level — the surrogate predicts what the target predicts — but doesn't probe internal mechanism. It's a black-box explanation of a black-box.

RecExplainer's three-tier alignment scheme bridges this gap. Behavior alignment is the conventional surrogate: feed the LLM user profile text and train it to predict the items the target recommender would suggest. The LLM learns to reproduce target predictions from textual input.

Intention alignment goes deeper. Instead of giving the LLM only text, it incorporates the target recommender's neural-layer activations (the embeddings of users and items in the target's latent space) into the LLM's prompt. The LLM is fine-tuned to understand these embeddings as a multimodal input — text and recommendation-model embeddings are two modalities. Predictions now leverage the target's internal representation, not just its outputs.

Hybrid alignment combines both: text and embeddings together. The LLM produces explanations that integrate the human-interpretable reasoning the text supports and the high-fidelity behavior matching the embeddings provide.

The general principle: when you need to interpret a black-box model, behavioral mimicry and internal-state inspection are complementary. Each alone is partial — behavioral mimicry misses the mechanism, internal inspection misses the human-readable explanation. Combining them produces explanations that are both faithful to the target and intelligible to users. The pattern generalizes beyond recommendation: any model interpretation problem benefits from this dual access.

Inquiring lines that use this note as a source 14

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 4

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
13 direct connections · 122 in 2-hop network ·dense cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

RecExplainer uses LLM as surrogate model with three alignment methods — behavior intention and hybrid for recommendation interpretability