Prompting Science Report 4: Playing Pretend: Expert Personas Don't Improve Factual Accuracy

Paper · arXiv 2512.05858 · Published December 5, 2025
Personas and Personality

Summary This is the fourth in a series of short reports that help business, education, and policy leaders understand the technical details of working with AI through rigorous testing. Here, we ask whether assigning personas to models improves performance on difficult objective multiple-choice questions. We study both domain-specific expert personas and low-knowledge personas, evaluating six models on GPQA Diamond (Rein et al. 2024) and MMLU-Pro (Wang et al. 2024), graduate-level questions spanning science, engineering, and law. We tested three approaches: ●​ In-Domain Experts: Assigning the model an expert persona (“you are a physics expert”) matched to the problem type (physics problems) had no significant impact on performance (with the exception of the Gemini 2.0 Flash model). ●​ Off-Domain Experts (Domain-Mismatched): Assigning the model an expert persona (“you are a physics expert”) not matched to the problem type (law problems) resulted in marginal differences. ●​ Low-Knowledge Personas: We assigned the model negative capability personas (layperson, young child, toddler), which were generally harmful to benchmark accuracy. Across both benchmarks, persona prompts generally did not improve accuracy relative to a no-persona baseline.

Introduction. Why Persona Prompts?​ The official documentation for many AI models recommends persona prompting, where the AI is assigned a specific role (“act like a”). For example, Google’s Vertex AI prompt-design guide says to “Assign a role” as a best practice and provides a sample template that begins, “You are a [persona, such as a ‘math teacher’ or ‘automotive expert’].” (Google Cloud 2024) Anthropic’s samples include, “You are an expert AI tax analyst. You help users understand the details of the tax code.” (Anthropic 2024). OpenAI’s developer materials take a similar approach and include “You are a world-class Python developer... (Sanders 2022). Persona prompts may serve many purposes (such as making an AI respond in a particular style or simulate a particular viewpoint), but many guides suggest that they may improve the quality of output when models are given objective questions to answer.

Discussion / Conclusion. Our findings indicate that, for difficult factual questions of the kind captured by GPQA Diamond and MMLU-Pro, assigning an expert persona is not a reliable way to improve accuracy. Across six models and two benchmarks, persona prompts leave performance unchanged relative to a no-persona baseline in most cases. When persona prompts do matter, they are more likely to reduce accuracy than to increase it. The main exception is Gemini 2.0 Flash on MMLU-Pro, which shows improvements for all five expert personas overall and, at the domain level, in Engineering and Chemistry; GPT-4o also shows a small in-domain gain in Law. In contrast, low-knowledge personas (especially “Toddler”) often reduce accuracy, with “Layperson” effects varying by model (harmful in several cases, but not all). Overall, tailoring personas to match the question domain shows no consistent benefit across models, with the limited improvements above appearing model and question-specific rather than generalizable. A notable failure mode emerges from the domain-mismatched expert conditions in the Gemini Flash family.