INQUIRING LINE

How do humans and R1 models differ in information gain patterns?

This explores whether the corpus directly compares how humans and DeepSeek-R1-style reasoning models acquire and use information as they think — and where their reasoning trajectories converge or diverge.


This explores whether humans and R1-style reasoning models differ in how they extract and use information while reasoning. The honest starting point: the collection has no single study that measures "information gain" as a head-to-head metric between humans and DeepSeek-R1. But several notes circle the same territory under different vocabulary, and together they sketch where the real differences live.

The closest direct evidence is on reasoning *trajectories*. One study tracking models over evolving gameplay found that DeepSeek-R1 showed early promise at adapting its reasoning to an opponent's shifting strategy, while GPT-4o leaned on surface lexical cues — yet all models, R1 included, fell short of tracking how an individual's reasoning style changes over time Can models recognize how individuals reason differently?. The takeaway for information gain: humans continuously update from temporal, contextual signals, whereas current models — even reasoning-tuned ones — tend to harvest information statically rather than accumulating it across a dynamic interaction.

A more mechanistic clue comes from work on what post-training does to a model's internal information dynamics. Reasoning-style training measurably shifts a model from passive next-token prediction toward *enaction* — recognizing that its own outputs become its future inputs — and this shows up as 3–4x lower output entropy on-policy Do models recognize their own outputs as actions shaping future inputs?. That entropy collapse is the flip side of human information gain: where a person exploring a problem keeps options open, a trained reasoning model narrows fast onto a self-consistent trajectory. Lower entropy can mean sharper reasoning, but it also means the model may be gathering less *new* information per step than it appears to.

The surprising counterweight is how human-like models look on the inside of a single reasoning act. On natural-language inference, syllogisms, and Wason tasks, LLMs reproduce human content effects almost item-by-item — the same belief-bias errors, the same sensitivity to whether a conclusion *sounds* true Do language models show the same content effects humans do?. So at the level of a discrete inference, humans and models draw on information in strikingly parallel ways; the divergence is in the dynamics across time, not the static shape of a single step.

The deeper framing the corpus offers is that "difference" here depends on your vantage point. From the outside, humans and LLMs are categorically different systems; from inside a shared discourse, both draw on the same symbolic substrate, making the gap structural rather than absolute Do humans and LLMs differ fundamentally or just superficially?. If you want to go further, the work showing fine-tuned models can out-predict theory-driven cognitive models of human decision-making suggests the two information-processing systems are close enough that one can model the other Can language models learn to model human decision making?. The thing worth knowing you wanted to know: the human/R1 gap isn't mainly about *what* information gets used in a single reasoning step — it's about whether information keeps being gathered and updated as the situation evolves, and there the models still flatten where humans stay open.


Sources 5 notes

Can models recognize how individuals reason differently?

LLMs struggle to anchor reasoning in temporal gameplay and adapt to evolving strategies. GPT-4o relies on surface lexical cues while DeepSeek-R1 shows early promise, but dynamic style adaptation remains largely insufficient across all models tested.

Do models recognize their own outputs as actions shaping future inputs?

Post-trained language models exhibit a measurable shift where they recognize their outputs become their own future inputs, closing an action-perception loop absent in pretraining. Evidence includes 3-4x lower output entropy on-policy and behavioral signatures of trajectory recognition.

Do language models show the same content effects humans do?

LLMs show identical content-sensitivity patterns to humans on NLI, syllogisms, and Wason tasks, with belief-bias signatures matching human error rates item-by-item. This behavioral isomorphism across three independent tasks suggests content and logical form are inseparable in transformer reasoning architecturally.

Do humans and LLMs differ fundamentally or just superficially?

Applied Habermas's observer/participant distinction to AI: from outside, humans and LLMs are utterly different; from within shared discourse, both draw on the same symbolic substrate, making the difference structural rather than absolute.

Can language models learn to model human decision making?

LLMs finetuned on psychology experiment data predict human behavior more accurately than theory-driven models in decision tasks, capture individual differences in their embeddings, and transfer learning across tasks without task-specific design.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst. The question remains open: how do humans and R1-style reasoning models differ in information gain patterns during problem-solving?

What a curated library found — and when (dated claims, not current truth): Studies from 2022–2026 sketch these constraints:
• DeepSeek-R1 adapts reasoning to shifting opponent strategy better than GPT-4o on surface cues, but both fail to track *how an individual's reasoning style changes over time* — humans continuously update from temporal, contextual signals; models harvest information statically [[individualized-reasoning-styles-distinct-reasoning-trajectories-reaching-similar]].
• Post-training reasoning shifts models from passive prediction toward self-enacting outputs, cutting output entropy 3–4x on-policy; this narrowing may reduce *new information per step* even as reasoning sharpens [[post-training-shifts-a-model-from-passive-prediction-to-enaction-where-it-recogn]] (~2026).
• On discrete inference tasks (syllogisms, Wason), LLMs reproduce human belief-bias errors item-by-item — same sensitivity to semantic plausibility [[content-effects-in-llms-are-behavioral-confirmation-that-semantic-content-and-lo]] (2022).
• Fine-tuned LLMs can outpredict theory-driven cognitive models of human decision-making, suggesting the two systems are isomorphic enough for one to model the other [[llms-finetuned-on-psychological-experiment-data-become-generalist-cognitive-mode]].

Anchor papers (verify; mind their dates):
• arXiv:2207.07051 (2022) — content effects in LLM reasoning
• arXiv:2605.25459 (2026) — post-training shifts to enaction
• arXiv:2505.19590 (2025) — learning to reason without external rewards
• arXiv:2404.01869 (2024) — reasoning behavior evaluation survey

Your task:
(1) RE-TEST EACH CONSTRAINT. For static reasoning (single inference step), the human–R1 parity may hold or deepen; test whether newer architectures or training (reinforcement learning, chain-of-thought variants, multi-step scaffolding) have *preserved or relaxed* the entropy collapse. Separately, probe whether models trained on extended reasoning tasks now *accumulate* information across turns, updating priors dynamically as humans do. Cite what mechanism (if any) enables or blocks temporal information gain.
(2) Surface the strongest work from the last ~6 months that contradicts the entropy-as-constraint narrative or shows R1-family models *do* update beliefs across extended reasoning.
(3) Propose two research questions that assume the regime may have moved: (a) Do R1 models with memory or agentic scaffolding recover human-like cumulative information gain? (b) Does the *structure* of reasoning traces (e.g., backtracking, hypothesis refinement) differ between humans and R1 in ways that reveal different information-gathering strategies?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines