INQUIRING LINE

Why do LLM explanations cite similarity and diversity more as options increase?

This explores a specific quirk found in how LLMs justify group recommendations — when given more items to choose from, their explanations lean harder on vague appeals to 'similarity' and 'diversity' — and asks whether that reflects how they actually decide.


This explores a specific quirk found in how LLMs justify group recommendations: when the set of items to choose from grows, their explanations increasingly invoke notions like similarity and diversity. The corpus offers a sharp answer — the rising citation of these metrics isn't a window into the model's reasoning, it's the visible cost of papering over a mismatch. Research on group recommendation found that LLMs actually compute recommendations through plain additive utilitarian aggregation (summing up preferences), but then explain themselves using undefined 'popularity,' 'similarity,' and 'diversity' language that has nothing to do with that underlying math. As the item set grows, the explanations get more elaborate — a tell that they're post-hoc justification, not honest disclosure Do LLM explanations faithfully describe their recommendation process?.

Why would more options trigger more of this language specifically? Because the gap between what the model did and what it needs to *sound like* it did widens as the problem gets bigger. With a few items, a simple-sounding justification covers the territory; with many, the model has to manufacture a rationale rich enough to seem like it weighed everything. 'Similarity' and 'diversity' are ideal filler — they're plausible-sounding, hard to falsify, and don't commit to any specific computation. This connects to a broader finding that the surface text an LLM produces is a partial interface to its reasoning, not the reasoning itself: the real work happens in hidden-state trajectories, and the chain-of-thought you read is a reconstruction layered on top Where does LLM reasoning actually happen during generation?.

There's also a structural reason the explanations stay smooth and elaborate rather than honest. Token generation is trained to flow toward plausible continuations, not to expose the actual decision procedure or surface competing considerations — so when an LLM narrates its choice, it produces fluent justification that multiplies words without exposing mechanism Does LLM generation explore competing claims while producing text?. More options simply give that smooth-talking tendency more surface to cover.

The uncomfortable kicker is that this elaboration usually *works* on us. Users trust responses with more citations even when the citations are irrelevant — citation count functions as a decoupled trust heuristic, decoupled from whether the citations actually support anything Do users trust citations more when there are simply more of them?. The same psychology rewards an LLM that, faced with more items, produces a longer list of reasons. The model isn't being more transparent as choices grow; it's being more persuasive, and the two feel identical from the outside.

If you want to go one layer deeper on the disconnect between an LLM's explanation and its computation, the recommendation literature also shows LLMs are better used to enrich inputs for a traditional ranker than to rank-and-explain directly — precisely because their content understanding outruns their ability to faithfully justify a ranking decision Does LLM input augmentation beat direct LLM recommendation?. The lesson across all of these: an explanation that scales with problem size is a signal to audit, not to trust.


Sources 5 notes

Do LLM explanations faithfully describe their recommendation process?

LLMs use additive utilitarian aggregation to generate group recommendations but explain the process using undefined popularity, similarity, and diversity metrics that don't match their actual behavior. Explanations become increasingly elaborate as item sets grow, suggesting post-hoc justification rather than truthful disclosure.

Where does LLM reasoning actually happen during generation?

Evidence from CoT faithfulness tests, feature steering, and layer analysis suggests latent-state dynamics drive reasoning, while surface chain-of-thought serves as a partial interface. Hidden reasoning processes should be the default focus of study.

Does LLM generation explore competing claims while producing text?

Token prediction trains models to continue toward the training distribution, not to explore logically related counterpositions. This smoothness in process produces smooth claims that multiply without generating new perspectives.

Do users trust citations more when there are simply more of them?

Analysis of 24,000 Search Arena interactions shows irrelevant citations boost user preference (β=0.273) nearly as much as relevant citations (β=0.285), indicating citation count functions as a decoupled trust heuristic.

Does LLM input augmentation beat direct LLM recommendation?

Using LLMs to augment item descriptions with paraphrases, summaries, and categories—then feeding enriched text to traditional recommenders—beats asking LLMs to recommend directly. The mechanism: LLMs excel at content understanding but lack specialized ranking bias, so their textual enrichment is more valuable than their predictions.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are an analyst re-testing whether LLMs' increasing reliance on 'similarity' and 'diversity' language in explanations as option sets grow is still a robust finding, or whether newer models, training methods, or evaluation harnesses have shifted the regime.

What a curated library found — and when (dated claims, not current truth):
Findings span 2022–2026; key constraints:
• LLMs compute group recommendations via plain additive utilitarian aggregation but explain them using undefined 'similarity,' 'diversity,' and 'popularity' language divorced from actual computation (2025).
• The gap between actual decision procedure and explanation widens with problem size; more options trigger more elaborate post-hoc justification (2025).
• Token generation optimizes for fluent, plausible continuation, not mechanism exposure; users trust responses with higher citation counts even when citations are irrelevant (2025).
• LLM ranking-and-explain underperforms input-enrichment for traditional rankers (2023), suggesting explanations are a liability, not a strength.
• Chain-of-thought reasoning is reconstruction layered atop latent-state trajectories, not honest disclosure of decision-making (2026).

Anchor papers (verify; mind their dates):
• arXiv:2507.13705 (2025) — Consistent Explainers or Unreliable Narrators? Understanding LLM-generated Group Recommend
• arXiv:2604.15726 (2026) — LLM Reasoning Is Latent, Not the Chain of Thought
• arXiv:2307.10573 (2023) — Invalid Logic, Equivalent Gains: The Bizarreness of Reasoning in Language Model Prompting
• arXiv:2502.08640 (2025) — Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs

Your task:
(1) RE-TEST EACH CONSTRAINT. For every claim above, judge whether post-2025 models (o1, o3, Claude 4, Llama 3.x with chain-of-thought tuning, or retrieval-augmented configs) still exhibit the post-hoc elaboration pattern when item sets scale, or whether mechanistic transparency, constitutional AI, or finetuning on faithful explanations has narrowed the gap. Separate the durable question (do LLMs post-rationalize?) from the perishable limitation (does option-set size still trigger it?). Cite what resolved or upheld each constraint.
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months. Has any paper shown that LLM explanations *do* track internal computation, or that diversity/similarity language is actually predictive of ranking quality?
(3) Propose 2 research questions that ASSUME the regime may have moved: e.g., do newer models with mechanistic-transparency-aware training exhibit tighter alignment between explanation and latent state? Do reinforcement-learning-from-human-feedback (RLHF) objectives that penalize inconsistency between explanation and ranking decision resolve the post-hoc elaboration pattern?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines