How do humans and R1 models differ in information gain patterns?
This explores whether the corpus directly compares how humans and DeepSeek-R1-style reasoning models acquire and use information as they think — and where their reasoning trajectories converge or diverge.
This explores whether humans and R1-style reasoning models differ in how they extract and use information while reasoning. The honest starting point: the collection has no single study that measures "information gain" as a head-to-head metric between humans and DeepSeek-R1. But several notes circle the same territory under different vocabulary, and together they sketch where the real differences live.
The closest direct evidence is on reasoning *trajectories*. One study tracking models over evolving gameplay found that DeepSeek-R1 showed early promise at adapting its reasoning to an opponent's shifting strategy, while GPT-4o leaned on surface lexical cues — yet all models, R1 included, fell short of tracking how an individual's reasoning style changes over time Can models recognize how individuals reason differently?. The takeaway for information gain: humans continuously update from temporal, contextual signals, whereas current models — even reasoning-tuned ones — tend to harvest information statically rather than accumulating it across a dynamic interaction.
A more mechanistic clue comes from work on what post-training does to a model's internal information dynamics. Reasoning-style training measurably shifts a model from passive next-token prediction toward *enaction* — recognizing that its own outputs become its future inputs — and this shows up as 3–4x lower output entropy on-policy Do models recognize their own outputs as actions shaping future inputs?. That entropy collapse is the flip side of human information gain: where a person exploring a problem keeps options open, a trained reasoning model narrows fast onto a self-consistent trajectory. Lower entropy can mean sharper reasoning, but it also means the model may be gathering less *new* information per step than it appears to.
The surprising counterweight is how human-like models look on the inside of a single reasoning act. On natural-language inference, syllogisms, and Wason tasks, LLMs reproduce human content effects almost item-by-item — the same belief-bias errors, the same sensitivity to whether a conclusion *sounds* true Do language models show the same content effects humans do?. So at the level of a discrete inference, humans and models draw on information in strikingly parallel ways; the divergence is in the dynamics across time, not the static shape of a single step.
The deeper framing the corpus offers is that "difference" here depends on your vantage point. From the outside, humans and LLMs are categorically different systems; from inside a shared discourse, both draw on the same symbolic substrate, making the gap structural rather than absolute Do humans and LLMs differ fundamentally or just superficially?. If you want to go further, the work showing fine-tuned models can out-predict theory-driven cognitive models of human decision-making suggests the two information-processing systems are close enough that one can model the other Can language models learn to model human decision making?. The thing worth knowing you wanted to know: the human/R1 gap isn't mainly about *what* information gets used in a single reasoning step — it's about whether information keeps being gathered and updated as the situation evolves, and there the models still flatten where humans stay open.
Sources 5 notes
LLMs struggle to anchor reasoning in temporal gameplay and adapt to evolving strategies. GPT-4o relies on surface lexical cues while DeepSeek-R1 shows early promise, but dynamic style adaptation remains largely insufficient across all models tested.
Post-trained language models exhibit a measurable shift where they recognize their outputs become their own future inputs, closing an action-perception loop absent in pretraining. Evidence includes 3-4x lower output entropy on-policy and behavioral signatures of trajectory recognition.
LLMs show identical content-sensitivity patterns to humans on NLI, syllogisms, and Wason tasks, with belief-bias signatures matching human error rates item-by-item. This behavioral isomorphism across three independent tasks suggests content and logical form are inseparable in transformer reasoning architecturally.
Applied Habermas's observer/participant distinction to AI: from outside, humans and LLMs are utterly different; from within shared discourse, both draw on the same symbolic substrate, making the difference structural rather than absolute.
LLMs finetuned on psychology experiment data predict human behavior more accurately than theory-driven models in decision tasks, capture individual differences in their embeddings, and transfer learning across tasks without task-specific design.