INQUIRING LINE

Do stated character beliefs predict decisions better when extracted from text?

This explores whether knowing a character's stated beliefs — pulled out of narrative or document text — actually helps predict the choices they make, and where text-extracted belief breaks down as a predictor.


This reads the question as: when you mine a character's psychology from text and feed it back to a model, do the resulting predictions of their decisions get better? The corpus says yes — but with a sharp caveat about what kind of belief you extract and from where. The clearest 'yes' comes from the LIFECHOICE work, where LLMs predicted character choices across 388 novels more accurately when given an expert-written persona profile paired with memories retrieved for their relevance to that character's psychology — beating automated summarization by about 5% Can LLMs predict character choices from narrative context?. The signal isn't just 'who is this character' but 'which of their past moments matter to this decision.' Pulling a persona from documents can also generalize: stakeholder personas semantically clustered out of domain texts transfer across evaluation tasks without redesign, suggesting text-grounded belief profiles carry portable predictive structure Can personas extracted from documents generalize across evaluation tasks?.

But here's the thing you might not expect: stated beliefs can out-predict the text entirely. In debate corpora, a reader's political and religious ideology labels predicted persuasion outcomes better than any linguistic feature of the argument — and language effects measured without controlling for who's listening turned out to be confounded by audience composition Does what readers believe matter more than what debaters say?. So 'extracted from text' isn't automatically the winning move; sometimes the cheap demographic prior about the believer beats the rich textual signal. That reframes the question: extraction helps when the text encodes belief the label can't, and loses when the label already captures it.

There's also a deeper rival to text extraction: learning the decision function directly. LLMs fine-tuned on psychology-experiment data became generalist cognitive models that out-predicted theory-driven models and captured individual differences in their embeddings — no hand-written belief statement required Can language models learn to model human decision making?. And whole AI-persona pipelines reproduced 76% of published experimental main effects, with success tracking the strength of the original evidence rather than the eloquence of the persona Can AI personas reliably replicate human experiment results?. Belief-as-text is one lever; belief-as-learned-distribution is another, and they don't always agree.

The failure modes are worth knowing because they tell you when extracted belief will mislead you. Models often default to surface-level strategies instead of genuinely tracking what an agent believes — and forcing explicit belief-tracking (hybrid Bayesian architectures) beat the LLM-alone approach, hinting the gap is architectural, not just a matter of better prompts Do large language models genuinely simulate mental states?. Worse, the '20 questions' regeneration test shows an LLM doesn't commit to one character at all — it holds a superposition and samples a fresh, locally-consistent self each generation, so a 'stated belief' may be an artifact of one draw, not a stable disposition Do large language models actually commit to a single character?. RLHF compounds this by baking in priors of its own — models predict conciliatory, benefit-oriented persuasion regardless of context Do LLMs predict persuasion based on actual dialogue or training bias?.

So the honest answer: extracting belief from text predicts decisions better when (a) you retrieve the psychologically relevant slice, not a flat summary, and (b) the believer's identity label doesn't already give it away for free. The frontier debate underneath — whether a trained persona is a real, sticky disposition you can read off Are RLHF personas performed characters or realized dispositions? or a sampled fiction — is exactly what determines whether 'stated belief' is a signal or a mirage Are LLM personas realized or merely simulated through training?.


Sources 10 notes

Can LLMs predict character choices from narrative context?

The LIFECHOICE benchmark (1,462 decisions across 388 novels) shows LLMs predict character choices better when given expert-written persona profiles paired with retrieved memories relevant to the character's psychology. This persona-based approach outperforms automated summarization by 5%.

Can personas extracted from documents generalize across evaluation tasks?

MAJ-EVAL automatically extracts stakeholder personas from domain documents via semantic clustering and orchestrates structured three-phase debate, achieving reproducible evaluation that transfers across tasks like summarization and dialogue without manual redesign. The approach grounds personas in real stakeholder perspectives rather than arbitrary roles.

Does what readers believe matter more than what debaters say?

Analysis of debate corpora shows that political and religious ideology labels of voters outpredict linguistic features when modeling debate outcomes. Language effects observed without reader controls are confounded by audience composition correlated with debate topics.

Can language models learn to model human decision making?

LLMs finetuned on psychology experiment data predict human behavior more accurately than theory-driven models in decision tasks, capture individual differences in their embeddings, and transfer learning across tasks without task-specific design.

Can AI personas reliably replicate human experiment results?

Viewpoints AI reproduced 84 of 111 main effects from Journal of Marketing experiments with replication success strongly correlated to original p-value strength. Marginal effects showed unreliable performance with both false positives and negatives.

Do large language models genuinely simulate mental states?

ChangeMyView and FANTOM benchmarks show LLMs fail at authentic perspective-taking in open-ended scenarios, despite succeeding on structured tasks. Hybrid Bayesian architectures that force explicit belief tracking outperform LLM-alone approaches, suggesting the gap is architectural rather than merely training-based.

Do large language models actually commit to a single character?

Shanahan's 20-questions test shows LLMs maintain a superposition of consistent objects or characters and sample from that distribution at generation time. Regenerating the same response yields different outputs, each consistent with prior context, proving no fixed commitment exists.

Do LLMs predict persuasion based on actual dialogue or training bias?

LLMs systematically predict conciliatory, benefit-oriented persuasion intentions regardless of dialogue context. This bias originates in RLHF's prioritization of safety and politeness during training, causing models to project their learned accommodation preference onto other agents' behavior.

Are RLHF personas performed characters or realized dispositions?

Post-training installs stable dispositional profiles that persist under adversarial pressure, marking them as realized rather than performed. The stickiness of trained personas across conversations distinguishes them from prompt-induced role-play that collapses under jailbreaks.

Are LLM personas realized or merely simulated through training?

Post-training installs robust personas that resist adversarial pressure and persist as substrate-level dispositions, distinguishing realization from pretense. This quasi-realizationist account preserves explanatory power while treating LLMs as possessing genuine quasi-beliefs and quasi-desires.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-testing claims about whether extracted character beliefs predict decisions better than alternatives. The question remains open: under what conditions does belief-as-text outperform belief-as-label, belief-as-learned-distribution, or belief-as-architecture?

What a curated library found — and when (dated claims, not current truth):
Findings span 2019–2026; treat them as perishable constraints to re-examine.

• Text-grounded persona profiles + retrieved memories beat automated summarization by ~5% on novel character decisions (LIFECHOICE, ~2024); persona extraction transfers across tasks semantically.
• Stated demographic priors (political ideology, religion) predict persuasion outcomes BETTER than linguistic features; the signal often lives in the listener, not the text (~2019–2025).
• Fine-tuned LLMs on psychology-experiment data outperform theory-driven belief models and capture individual differences in embeddings — no hand-written belief required (~2024).
• LLM persona simulations replicate 76% of published experimental effects; success tracks source-evidence strength, not persona eloquence (~2024).
• Hybrid Bayesian architectures beat LLM-alone belief-tracking; models default to surface-level strategies rather than genuine mental-state modeling (~2024–2025).
• 'Stated belief' may be a sampling artifact: LLMs hold superposition, regenerating locally-consistent but unstable selves (20-questions test, ~2024).
• RLHF biases models toward conciliatory, benefit-oriented persuasion regardless of context; Persona Vectors enable monitoring/control of trait stability (~2025–2026).

Anchor papers (verify; mind their dates):
• arXiv:1906.11301 (2019) — Exploring Prior Beliefs for Argument Persuasion
• arXiv:2404.12138 (2024) — Character is Destiny: Persona-Driven Decisions
• arXiv:2408.16073 (2024) — AI Personas for Replication and Prediction
• arXiv:2511.00222 (2025) — Consistently Simulating Human Personas with Multi-Turn RL

Your task:
(1) RE-TEST EACH CONSTRAINT. For every finding above, judge whether newer models (o1, o3, Grok-3), retrieval-augmented generation (RAG + agentic memory), in-context learning, or improved theory-of-mind harnesses (PersuasiveToM, systematic reviews ~2025) have since relaxed or overturned the architectural defaults, surface-level drift, or RLHF bias. Plainly separate the durable question ('can text encode belief better than identity labels?') from perishable limitations ('LLMs sample unstably' — has this been solved by multi-turn RL or persona vectors?). Cite what resolved each constraint.

(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months. Pay special attention to: does Persona Vectors or Multi-Turn RL actually stabilize belief-as-sampled-artifact? Do newer eval benchmarks (PersuasiveToM, The Thin Line Between Comprehension and Persuasion) show belief extraction still matters, or has the regime shifted entirely to learned distributions?

(3) Propose 2 research questions that ASSUME the regime may have moved: e.g., 'Under multi-agent orchestration with persistent memory, does extracted belief + agentic caching outperform fine-tuned personas?' and 'Can personas extracted from text + monitored with Persona Vectors achieve the stability of multi-turn RL personas?'

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines