Can models recognize how individuals reason differently?
Do language models capture the distinct reasoning paths and strategic styles that individual humans use when reaching the same conclusion? Current evaluations ignore this dimension entirely.
Different people arrive at the same conclusion through distinct reasoning paths. In social deduction games (Avalon), players facing identical information adopt different strategies — some track voting patterns, others read behavioral cues, others use counterfactual reasoning about what different role assignments would imply. These are individualized reasoning styles, and existing ToM evaluation entirely ignores them.
InMind proposes a framework built on dual-layer cognitive annotations: strategy traces capturing real-time reasoning signals (belief updates, intention inference, counterfactual thinking) and reflective summaries offering post-hoc contextualization of key events. Two gameplay modes — Observer (passive reasoning from another player's perspective) and Participant (active engagement) — enable both capturing and evaluating individualized reasoning.
Four tasks evaluate distinct aspects:
- Player Identification: Can the model recognize behavioral patterns aligned with a specific reasoning style?
- Reflection Alignment: Can it ground abstract post-game reflections in concrete game behavior?
- Trace Attribution: Can it simulate evolving in-context reasoning across time?
- Role Inference: Can it internalize reasoning styles to support belief modeling under uncertainty?
The evaluation of 11 LLMs reveals critical limitations. GPT-4o "frequently relies on lexical cues, struggling to anchor reflections in temporal gameplay or adapt to evolving strategies." The model latches onto surface-level language patterns rather than tracking the temporal evolution of reasoning. Temporal alignment between reflective reasoning and specific in-game events "remains challenging for nearly all evaluated models."
DeepSeek-R1 shows "early signs of style-sensitive reasoning" — suggesting that extended reasoning training may begin to capture individualized patterns where standard models cannot. But dynamic adaptation of strategic reasoning based on evolving interactions "is largely insufficient" across all models.
The implication: ToM evaluation that only checks whether the model gets the right answer misses whether it arrived there through a reasoning path that matches the individual it's modeling. Two correct answers can reflect completely different (and incompatible) reasoning styles.
Inquiring lines that use this note as a source 15
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Does the langue-parole distinction apply to human reasoning too?
- Where do humans and language models actually diverge in reasoning ability?
- How does open-ended evolver reasoning identify patterns across heterogeneous user trajectories?
- Can extended reasoning training capture individual strategic thinking styles?
- How does reasoning instability prevent models from modeling individuals?
- Why do language models capture individual differences in cognitive behavior?
- Why do language models approximate collective human judgment better than individuals?
- How do humans and R1 models differ in information gain patterns?
- What makes multi-hypothesis generation better than single-path social reasoning?
- What makes reasoning models worse at understanding people?
- Can reasoning style be steered as a single linear direction?
- Can theory of mind models generalize across structurally similar scenarios?
- How many distinct quasi-persons does a single language model actually support?
- How do language models track multiple negotiating parties' commitments simultaneously?
- What causes language models' strategic rationality to decline with increased game complexity?
Related concepts in this collection 3
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Do large language models use one reasoning style or many?
Explores whether LLMs share a universal strategic reasoning approach or develop distinct styles tailored to specific game types. Understanding this matters for predicting model behavior in competitive versus cooperative scenarios.
InMind adds the human-side dimension: not just model-specific reasoning profiles but player-specific trajectories that models fail to capture
-
Does any single persuasion technique work for everyone?
Can fixed persuasion strategies like appeals to authority or social proof be reliably applied across different people and situations, or do they require adaptation to individual traits and context?
individualized reasoning styles are why universal strategies fail in persuasion too: the reasoning path matters, not just the conclusion
-
Why do LLM persona prompts produce inconsistent outputs across runs?
Can language models reliably simulate different social perspectives through persona prompting, or does their run-to-run variance indicate they lack stable group-specific knowledge? This matters for whether LLMs can approximate human disagreement in annotation tasks.
persona instability may explain why LLMs fail at individualized reasoning: they cannot maintain stable models of individual reasoning styles
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models -- A Survey
- LLM Strategic Reasoning: Agentic Study through Behavioral Game Theory
- Reasoning Strategies in Large Language Models: Can They Follow, Prefer, and Optimize?
- Strategic Reasoning with Language Models
- The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
- Eliciting Reasoning in Language Models with Cognitive Tools
- InMind: Evaluating LLMs in Capturing and Applying Individual Human Reasoning Styles
- Fast, Slow, and Tool-augmented Thinking for LLMs: A Review
Original note title
individualized reasoning styles — distinct reasoning trajectories reaching similar conclusions — require cognitively grounded evaluation beyond output matching