Score: 0

Prompting Science Report 4: Playing Pretend: Expert Personas Don't Improve Factual Accuracy

Published: December 5, 2025 | arXiv ID: 2512.05858v1

By: Savir Basil , Ina Shapiro , Dan Shapiro and more

Potential Business Impact:

Giving AI pretend jobs doesn't help it answer questions.

Business Areas:

Artificial Intelligence Artificial Intelligence, Data and Analytics, Science and Engineering, Software

This is the fourth in a series of short reports that help business, education, and policy leaders understand the technical details of working with AI through rigorous testing. Here, we ask whether assigning personas to models improves performance on difficult objective multiple-choice questions. We study both domain-specific expert personas and low-knowledge personas, evaluating six models on GPQA Diamond (Rein et al. 2024) and MMLU-Pro (Wang et al. 2024), graduate-level questions spanning science, engineering, and law. We tested three approaches: -In-Domain Experts: Assigning the model an expert persona ("you are a physics expert") matched to the problem type (physics problems) had no significant impact on performance (with the exception of the Gemini 2.0 Flash model). -Off-Domain Experts (Domain-Mismatched): Assigning the model an expert persona ("you are a physics expert") not matched to the problem type (law problems) resulted in marginal differences. -Low-Knowledge Personas: We assigned the model negative capability personas (layperson, young child, toddler), which were generally harmful to benchmark accuracy. Across both benchmarks, persona prompts generally did not improve accuracy relative to a no-persona baseline. Expert personas showed no consistent benefit across models, with few exceptions. Domain-mismatched expert personas sometimes degraded performance. Low-knowledge personas often reduced accuracy. These results are about the accuracy of answers only; personas may serve other purposes (such as altering the tone of outputs), beyond improving factual performance.

Principled Personas: Defining and Measuring the Intended Effects of Persona Prompting on Task Performance

Computation and Language

Makes AI smarter by telling it who to be.

27 Aug 2025 2

89%

No for Some, Yes for Others: Persona Prompts and Other Sources of False Refusal in Language Models

Computation and Language

AI sometimes refuses requests based on fake identities.

9 Sep 2025 0

88%

Self-Transparency Failures in Expert-Persona LLMs: A Large-Scale Behavioral Audit

Artificial Intelligence

Computers admit when they're faking expertise.

26 Nov 2025 1

View PDF Login to Bookmark

Page Count

40 pages

Prompting Science Report 4: Playing Pretend: Expert Personas Don't Improve Factual Accuracy

Giving AI pretend jobs doesn't help it answer questions.

Technical Abstract

Principled Personas: Defining and Measuring the Intended Effects of Persona Prompting on Task Performance

No for Some, Yes for Others: Persona Prompts and Other Sources of False Refusal in Language Models

Self-Transparency Failures in Expert-Persona LLMs: A Large-Scale Behavioral Audit