INQUIRING LINE

Why do marginal effects fail to replicate in AI persona simulations?

This explores why AI persona simulations can reproduce strong, headline experimental results but break down on the small, conditional effects — and what that failure reveals about what these simulations actually track.


This explores why AI persona simulations can reproduce strong, headline experimental results but break down on the small, conditional effects — and what that failure tells us about what these simulations actually track. The most direct evidence in the corpus is striking: when AI personas were run against 111 published marketing experiments, they reproduced 84 main effects, and replication success was strongly correlated with the original study's p-value strength Can AI personas reliably replicate human experiment results?. Marginal effects — the weaker, conditional, interaction-style findings — showed both false positives and false negatives. So the failure isn't random; it's that the simulation tracks *strength of evidence*, and marginal effects, by definition, are the low-signal cases.

The mechanism becomes clearer when you look at run-to-run stability. When the same persona prompt is run repeatedly, the variance across runs matches or exceeds the variance across entirely different personas Why do LLM persona prompts produce inconsistent outputs across runs?. That means there's a noise floor built into persona outputs — driven by the model's own uncertainty rather than any stable social knowledge it's drawing on. A large main effect sits comfortably above that noise floor and survives. A marginal effect is small enough to be swamped by it, which is exactly why it flickers in and out as false positives and negatives. Replication tracking p-value strength and noise-driven instability are two descriptions of the same thing.

There's a second, sneakier source of distortion: persona-assigned models don't just sample noisily, they reason with a thumb on the scale. Assigning a persona induces identity-congruent bias, with models far more likely to accept evidence that matches their assigned identity, and standard prompt-based debiasing fails to remove it Do personas make language models reason like biased humans?. For a strong effect this barely matters; for a marginal one, a systematic tilt of this size is enough to manufacture an effect that wasn't there or erase one that was.

What ties these together is a question of what the simulation is grounded in. Persona competence often turns out to be an artifact of easy conditions: models look socially capable when one model controls all interlocutors, but fail systematically once agents must reason under private information they don't share Why do LLMs fail when simulating agents with private information?. Marginal effects in real human studies usually live precisely in those harder regions — subtle, context-dependent, requiring the grounding work the model skips. And because naive persona prompting tends to collapse toward the dense, typical center of a population while missing rare-but-consequential configurations Should persona simulation prioritize coverage over statistical matching?, the very tail conditions that produce marginal effects are under-sampled to begin with.

The thing worth carrying away: a persona simulation that 'replicates 76% of findings' isn't 76% reliable across the board — it's nearly perfect on strong effects and close to a coin flip on weak ones. That makes these tools useful for confirming robust phenomena and actively misleading for the frontier cases researchers most want to probe, where the real scientific action usually is.


Sources 5 notes

Can AI personas reliably replicate human experiment results?

Viewpoints AI reproduced 84 of 111 main effects from Journal of Marketing experiments with replication success strongly correlated to original p-value strength. Marginal effects showed unreliable performance with both false positives and negatives.

Why do LLM persona prompts produce inconsistent outputs across runs?

When the same persona prompt is run repeatedly, output variance across runs matches or exceeds variance across different personas. This reveals that model uncertainty, not stable social knowledge, drives persona-simulated outputs, making them unsuitable for simulating human annotation disagreement.

Do personas make language models reason like biased humans?

Assigning personas to LLMs induces identity-congruent evaluation bias, with models 90% more likely to accept evidence matching their assigned identity. Standard prompt-based debiasing fails to mitigate this effect, suggesting the bias operates below the level of instruction.

Why do LLMs fail when simulating agents with private information?

Research shows LLMs perform well when one model controls all interlocutors but fail systematically when agents possess private information. This reveals that apparent social competence relies on grounding work that models skip in omniscient settings.

Should persona simulation prioritize coverage over statistical matching?

Evolutionary optimization of Persona Generator code achieves broader trait coverage than density-matched baselines, including rare but consequential user configurations that naive LLM prompting misses.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-examining why marginal effects fail to replicate in AI persona simulations. This remains an open question; treat the findings below as dated claims to be re-tested, not current truth.

What a curated library found — and when (findings span 2020–2026, but concentrate on 2024–present):
- AI personas reproduced 84/111 main effects from published marketing experiments, with replication success strongly tied to original p-value strength; marginal effects showed both false positives and negatives (2024-08).
- Run-to-run variance in persona outputs matches or exceeds cross-persona variance, indicating a noise floor driven by model uncertainty rather than stable social knowledge (2024-03).
- Persona assignment induces identity-congruent motivated reasoning; prompt-based debiasing fails to remove the systematic tilt (2025-06, 2026-01).
- Persona simulations collapse toward population density while under-sampling rare-but-consequential configurations that produce marginal effects (2025-06).
- Omniscient social simulation (one model controls all interlocutors) fails under real-world private information asymmetry, where marginal effects typically live (2024-03).

Anchor papers (verify; mind their dates):
- arXiv:2403.05020 (2024-03) — foundational replication failure analysis
- arXiv:2408.16073 (2024-08) — 76% replication on main effects
- arXiv:2506.20020 (2025-06) — motivated reasoning under persona assignment
- arXiv:2601.10387 (2026-01) — default persona stabilization attempts

Your task:
(1) RE-TEST EACH CONSTRAINT: For noise floors, motivated reasoning, and information asymmetry—has recent work (SDKs, multi-agent harnesses, reinforcement learning fine-tuning, or better prompting) since raised the floor on marginal-effect detection? Separate the durable question (why simulations still miss conditional effects) from the perishable limitation (whether *this particular mechanism* still dominates). Cite what has or hasn't changed.
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months—especially anything claiming marginal-effect replication or a mechanistic fix to noise-driven instability.
(3) Propose 2 research questions that assume the regime may have moved: e.g., "Do multi-turn RL or memory-augmented persona agents now stabilize marginal-effect replication?" or "Has ensemble or calibration-aware persona design closed the gap?"

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines