Do language models make rational strategic decisions in games?

Explores whether LLMs consistently apply game-theoretic reasoning to reach optimal strategies, and whether their performance holds as games become more complex. Understanding this matters for deploying LLMs in negotiation and competitive settings.

Synthesis note · 2026-06-03 · sourced from Reasoning Logic Internal Rules

Strategic decision-making — making choices that maximize expected utility given others' likely choices — is a demanding test of LLM rationality. Evaluating several frontier LLMs across complete-information games (Prisoner's Dilemma, Stag Hunt, etc.) and incomplete-information games (Deal-No-Deal), this work finds LLMs frequently deviate from rational strategies, and the deviation grows with game complexity (larger payoff matrices, deeper sequential trees). The fix is procedural: game-theoretic workflows that guide the model's reasoning and decision-making toward computing Nash equilibria. With the workflow, LLMs identify optimal strategies far better, reach near-optimal negotiation allocations, and become less exploitable.

The keeper is the same shape seen elsewhere: raw LLM rationality is unreliable and degrades with complexity, but an external reasoning scaffold that imposes the formal structure (here, game-theoretic computation) recovers it — capability is latent but needs the workflow to be reliably elicited.

This connects the vault's strategic-reasoning and workflow-scaffolding threads. It complements Do large language models use one reasoning style or many? (rationality isn't a uniform capability) and Why do standard dialogue systems fail at tracking negotiation agreement?, and the workflow-restores-capability pattern echoes Can LLMs actually forecast time series better than we think?.

Inquiring lines that read this note 3

This note is a source for these research framings, grouped by the broader line of inquiry each explores. Scan the bold lines of inquiry; follow any specific question forward.

How faithfully do LLMs reflect their actual reasoning in outputs and explanations?

Do different game types reveal different strategic reasoning capabilities in LLMs?

Why do language models reinforce false assumptions instead of correcting them?

How do language models track multiple negotiating parties' commitments simultaneously?

Do language models learn genuine linguistic structure or just surface patterns?

What causes language models' strategic rationality to decline with increased game complexity?

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

14 direct connections · 100 in 2-hop network ·medium cluster Open in graph ↗

Do language models make rational strategic decis… Do large language models use one reasoning style o… Why do standard dialogue systems fail at tracking … Can LLMs actually forecast time series better than…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Do large language models use one reasoning style or many? Explores whether LLMs share a universal strategic reasoning approach or develop distinct styles tailored to specific game types. Understanding this matters for predicting model behavior in competitive versus cooperative scenarios.
both find strategic rationality is not a uniform general capability
Why do standard dialogue systems fail at tracking negotiation agreement? Standard dialogue state tracking monitors one user's goals, but negotiation requires tracking both parties' evolving positions simultaneously. Why is this bilateral requirement fundamentally different, and what makes existing models insufficient?
the negotiation-state-tracking demand these workflows must satisfy
Can LLMs actually forecast time series better than we think? Explores whether language models possess stronger forecasting ability than current benchmarks suggest, and what role workflow design plays in revealing or hiding that capability.
same workflow-restores-latent-capability pattern

Do language models make rational strategic decisions in games?

Inquiring lines that read this note 3

Related concepts in this collection 3

Related papers in this collection 8

Search by related questions 4