Why does chain-of-thought reasoning fail for personalization?
Standard reasoning traces produce logically sound but personally irrelevant answers. This explores why generic thinking doesn't anchor to user preferences and what might fix it.
PRIME documents a two-layer failure in applying reasoning to personalization:
Layer 1: Generic CoT fails. Enabling standard chain-of-thought often underperforms the non-thinking baseline for personalization tasks. The uncustomized reasoning trace "merely scratches the surface, seeking broad answers rather than to-the-point, user-specific responses." Generic reasoning explores the problem space without being anchored to the specific user's preferences, values, or communication style — producing reasoning that is logically sound but personally irrelevant.
Layer 2: Fine-tuning destroys thinking capacity. The "fast thinking" training paradigm (direct input→output mapping) turns fine-tuned LLMs into specialist models overfitted to the target space. They lose the generalist capability of generating meaningful intermediate thoughts when prompted. A common error is token repetition — the model has been trained to shortcut directly to outputs and can no longer produce coherent intermediate reasoning. This is not a minor degradation — the model structurally cannot think anymore.
The fix: personalized self-distillation. The model generates its own personalized thinking traces (using its pre-fine-tuning generalist capability), then trains on those traces alongside the standard fine-tuning objective. This produces reasoning that is both user-specific (anchored to the individual's preferences) and deep (maintaining the capacity for intermediate thought). The self-distillation approach leverages the model's own capabilities rather than requiring external reasoning trace data.
This finding extends the reasoning/judgment split documented elsewhere. Since When does explicit reasoning actually help model performance?, personalization is a clear case of "continuous nuanced judgment" — matching preferences, style, and implicit expectations cannot be reduced to logical derivation steps. But PRIME shows the split is not absolute: personalized reasoning can help, provided the reasoning traces themselves are customized to the user.
The connection to Why does asking models to think first hurt performance? is structural: both findings demonstrate that thinking initially hurts but becomes helpful after the thinking process is adapted to the domain. In PRIME's case, self-distillation is the adaptation mechanism; in the TPO case, RL training is. The shared principle: raw thinking capability must be tuned to the domain before it adds value.
Inquiring lines that use this note as a source 6
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Why does chain-of-thought reasoning hurt recommendation tasks specifically?
- What makes historical user outputs more effective for personalization than semantic similarity?
- How does personalization differ mechanically from retrieval-augmented generation?
- Does semantic memory improve AI personalization more than episodic memory?
- Why does extending reasoning traces worsen persona consistency?
- When does combining episodic and semantic memory reduce personalization performance?
Related concepts in this collection 4
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
When does explicit reasoning actually help model performance?
Explicit reasoning improves some tasks but hurts others. What determines whether step-by-step reasoning chains are beneficial or harmful for a given problem?
personalization as a specific instance of the judgment-degradation zone
-
Why does asking models to think first hurt performance?
Initial prompts to generate internal thoughts degrade instruction-following performance. What reverses this harm, and can thinking become useful beyond math and logic?
parallel: thinking hurts until adapted; self-distillation and RL are distinct adaptation mechanisms
-
Does reflection in reasoning models actually correct errors?
When reasoning models reflect on their answers, do they genuinely fix mistakes, or merely confirm what they already decided? Understanding this matters for designing better training and inference strategies.
personalized thinking is a case where reflection must be customized to add value
-
Can user preferences be learned from just ten questions?
Explores whether adaptive question selection can efficiently infer user-specific reward coefficients without historical data or fine-tuning. This matters for scaling personalization without per-user model updates.
PReF addresses the same "generic fails, personalized succeeds" pattern at the reward level: a single reward function underperforms because it flattens individual preferences; factored rewards capture user-specific dimensions just as personalized thinking traces capture user-specific reasoning patterns
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- PRIME: Large Language Model Personalization with Cognitive Memory and Thought Processes
- Understanding Before Reasoning: Enhancing Chain-of-Thought with Iterative Summarization Pre-Prompting
- Rethinking Thinking Tokens: LLMs as Improvement Operators
- A Comment On "The Illusion of Thinking": Reframing the Reasoning Cliff as an Agentic Gap
- Chain of Thoughtlessness? An Analysis of CoT in Planning
- InMind: Evaluating LLMs in Capturing and Applying Individual Human Reasoning Styles
- Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models
- Direct Reasoning Optimization: Token-Level Reasoning Reflectivity Meets Rubric Gates for Unverifiable Tasks
Original note title
generic reasoning underperforms non-thinking for personalization tasks — personalized thinking via self-distillation is required because fast-thinking fine-tuning destroys generalist reasoning capability