INQUIRING LINE

Why do LLM persona simulations replicate main effects but fail on marginal effects?

This explores why LLM-simulated personas reliably reproduce the strong, headline findings of human experiments (main effects) but stumble on the subtle, conditional ones (marginal effects) — and what that gap reveals about what these simulations are actually doing.


This explores why LLM-simulated personas reliably reproduce the strong, headline findings of human experiments but break down on the subtle, conditional ones. The clearest data point in the corpus is direct: when AI personas re-ran marketing experiments, they reproduced 76% of main effects — and crucially, replication success tracked the *strength of the original evidence* (the p-value) Can AI personas reliably replicate human experiment results?. Main effects are the loud, robust signals. Marginal effects are the quiet ones, and there the same study found both false positives and false negatives. So the first answer is almost mechanical: the model is reproducing effects in proportion to how strongly they're written into the patterns it learned from, and faint effects don't survive that filter.

But the corpus suggests the deeper reason is *where the variance comes from*. When you run the same persona prompt repeatedly, the output swings as much across reruns of one persona as it does across genuinely different personas Why do LLM persona prompts produce inconsistent outputs across runs?. That means the noise floor of the model's own uncertainty is roughly the size of a marginal effect. A main effect is big enough to poke through that floor; a marginal effect is the same magnitude as the noise, so it gets drowned. The model isn't simulating a stable person with subtle conditional preferences — it's sampling from its own uncertainty, and subtle structure is exactly what uncertainty erases.

There's a statistical version of the same story that explains the false positives, not just the misses. Persona generation relies on heuristics that can't recover the *true joint distribution* from marginal data — it knows the population averages but invents the interactions How do we generate realistic personas at population scale?. Marginal effects often *are* interactions (this group responds differently under that condition), so a model that fakes the joint distribution will confidently produce conditional effects that aren't real. And conditioning on a specific profile doesn't rescue you: across 200,000+ participants, feeding LLMs personal profiles gave no measurable gain in predicting individuals Does conditioning LLMs on personal profiles improve prediction?. The lever you'd reach for to capture fine-grained differences turns out to be disconnected.

What makes the failure *systematic* rather than random is that personas don't just blur — they bend in a direction. Assigning an identity induces motivated reasoning: models become ~90% more likely to accept evidence congruent with their assigned identity, and standard debiasing prompts don't fix it because the bias sits below the instruction layer Do personas make language models reason like biased humans?. Combine identity-congruent bias with models that stubbornly resist personality conditioning in the first place Can open language models adopt different personalities through prompting?, and you get a tilt that distorts precisely the small, conditional effects while leaving the big effects standing. The summary note ties these threads together as three named failure modes — instability, conditioning resistance, and identity-congruent bias How accurately can language models simulate human personalities?.

The thing worth taking away: the main-effect successes and the marginal-effect failures aren't two separate facts — they're the same fact seen from two sides. These systems are good at reproducing the average and bad at reproducing the *structure around* the average, because they capture marginal distributions but fabricate the joints, and their internal noise is sized to wash out anything subtle. If you want to push on whether that's a fixable calibration problem or a deeper limit, the calibration-science argument How do we generate realistic personas at population scale? and the realizationist claim that personas are genuine installed dispositions Are LLM personas realized or merely simulated through training? make an interesting pair to read against each other.


Sources 8 notes

Can AI personas reliably replicate human experiment results?

Viewpoints AI reproduced 84 of 111 main effects from Journal of Marketing experiments with replication success strongly correlated to original p-value strength. Marginal effects showed unreliable performance with both false positives and negatives.

Why do LLM persona prompts produce inconsistent outputs across runs?

When the same persona prompt is run repeatedly, output variance across runs matches or exceeds variance across different personas. This reveals that model uncertainty, not stable social knowledge, drives persona-simulated outputs, making them unsuitable for simulating human annotation disagreement.

How do we generate realistic personas at population scale?

LLM persona generation produces systematic biases in downstream tasks like election forecasting because it relies on heuristic techniques that cannot recover true joint distributions from marginal data. Solving this requires benchmarks, training datasets, and structured frameworks analogous to ImageNet.

Does conditioning LLMs on personal profiles improve prediction?

Across 208,021 participants in the Psych-201 dataset, conditioning LLMs on participant profiles did not meaningfully improve predictions for specific individuals. The standard technique for individuation produces no measurable gains in person-level forecasting.

Do personas make language models reason like biased humans?

Assigning personas to LLMs induces identity-congruent evaluation bias, with models 90% more likely to accept evidence matching their assigned identity. Standard prompt-based debiasing fails to mitigate this effect, suggesting the bias operates below the level of instruction.

Can open language models adopt different personalities through prompting?

Research shows most open models fail to adopt prompted personalities, stubbornly retaining their trained ENFJ-like defaults. Only a few flexible models succeed. Combining role and personality conditioning improves results but doesn't fully overcome resistance.

How accurately can language models simulate human personalities?

LLMs replicate human responses at 85% fidelity in interviews and 76% of experimental effects in marketing studies. However, this accuracy masks three failure modes: run-to-run instability, resistance to personality conditioning, and identity-congruent cognitive biases that distort simulated reasoning.

Are LLM personas realized or merely simulated through training?

Post-training installs robust personas that resist adversarial pressure and persist as substrate-level dispositions, distinguishing realization from pretense. This quasi-realizationist account preserves explanatory power while treating LLMs as possessing genuine quasi-beliefs and quasi-desires.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are an LLM researcher re-testing constraints on persona simulation fidelity. The core question: why do LLM-simulated personas reliably replicate *main effects* from human experiments but fail systematically on *marginal (conditional) effects*—and has that constraint shifted?

What a curated library found — and when (findings span 2023–2026, dated claims not current truth):
• LLM personas reproduce 76% of published main effects, success tracking original p-value strength; marginal effects show both false negatives and false positives (2024-08, arXiv:2408.16073).
• Output instability across reruns of a single persona matches variance across different personas—noise floor ~= marginal effect size, drowning subtle structure (2024-03, arXiv:2403.05020).
• Persona induction does not improve individual-level prediction across 200k+ participants; conditioning on personal profiles yields no measurable gain (2024-08).
• Persona-assigned models exhibit ~90% stronger acceptance of identity-congruent evidence; standard debiasing prompts fail because bias sits below instruction layer (2025-06, arXiv:2506.20020).
• Most open LLMs resist personality conditioning, retaining intrinsic defaults that override persona assignment (2024-01, arXiv:2401.07115).

Anchor papers (verify; mind their dates):
• arXiv:2408.16073 (2024-08): 76% main-effect replication baseline.
• arXiv:2403.05020 (2024-03): instability-as-noise-floor argument.
• arXiv:2506.20020 (2025-06): motivated reasoning in persona-assigned models.
• arXiv:2511.00222 (2025-10): multi-turn RL for consistent persona simulation.

Your task:
(1) RE-TEST EACH CONSTRAINT. For the three failure modes (instability, conditioning resistance, identity-congruent bias): judge whether newer training (e.g., persona RL, arXiv:2511.00222), evaluation harnesses (multi-agent judges, arXiv:2507.21028), or architectural controls (persona vectors, arXiv:2507.21509) have since *relaxed* the noise floor, *overturned* resistance to conditioning, or *decoupled* identity bias from instruction following. Cite what resolved each, plainly state where constraints still hold.
(2) Surface the strongest *contradicting* or *superseding* work from the last ~6 months—especially anything claiming stable persona embedding or successful marginal-effect replication.
(3) Propose 2 research questions that *assume the regime may have moved*: e.g., "Can multi-turn RL close the main-to-marginal gap, and if so, at what compute/data cost?" or "Do persona vectors enable independent control of marginal vs. main effects?"

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines