SYNTHESIS NOTE
Psychology, Society, and Alignment Reasoning, Retrieval, and Evaluation

Can models recognize how individuals reason differently?

Do language models capture the distinct reasoning paths and strategic styles that individual humans use when reaching the same conclusion? Current evaluations ignore this dimension entirely.

Synthesis note · 2026-02-22 · sourced from Theory of Mind
How should researchers navigate LLM reasoning research? Why do LLMs excel at social norms yet fail at theory of mind?

Different people arrive at the same conclusion through distinct reasoning paths. In social deduction games (Avalon), players facing identical information adopt different strategies — some track voting patterns, others read behavioral cues, others use counterfactual reasoning about what different role assignments would imply. These are individualized reasoning styles, and existing ToM evaluation entirely ignores them.

InMind proposes a framework built on dual-layer cognitive annotations: strategy traces capturing real-time reasoning signals (belief updates, intention inference, counterfactual thinking) and reflective summaries offering post-hoc contextualization of key events. Two gameplay modes — Observer (passive reasoning from another player's perspective) and Participant (active engagement) — enable both capturing and evaluating individualized reasoning.

Four tasks evaluate distinct aspects:

The evaluation of 11 LLMs reveals critical limitations. GPT-4o "frequently relies on lexical cues, struggling to anchor reflections in temporal gameplay or adapt to evolving strategies." The model latches onto surface-level language patterns rather than tracking the temporal evolution of reasoning. Temporal alignment between reflective reasoning and specific in-game events "remains challenging for nearly all evaluated models."

DeepSeek-R1 shows "early signs of style-sensitive reasoning" — suggesting that extended reasoning training may begin to capture individualized patterns where standard models cannot. But dynamic adaptation of strategic reasoning based on evolving interactions "is largely insufficient" across all models.

The implication: ToM evaluation that only checks whether the model gets the right answer misses whether it arrived there through a reasoning path that matches the individual it's modeling. Two correct answers can reflect completely different (and incompatible) reasoning styles.

Inquiring lines that use this note as a source 15

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
14 direct connections · 149 in 2-hop network ·dense cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

individualized reasoning styles — distinct reasoning trajectories reaching similar conclusions — require cognitively grounded evaluation beyond output matching