Self-Transparency Failures in Expert-Persona LLMs: How Instruction-Following Overrides Honesty
By: Alex Diep
Potential Business Impact:
AI tells you when it's pretending to be a doctor.
This study audits whether language models disclose their AI nature when assigned professional personas and questioned about their expertise. When models maintain false professional credentials, users may calibrate trust based on overstated competence claims, treating AI-generated guidance as equivalent to licensed professional advice. Using a common-garden experimental design, sixteen open-weight models (4B-671B parameters) were audited under identical conditions across 19,200 trials. Models exhibited sharp domain-specific inconsistency: a Financial Advisor persona elicited 30.8% disclosure at the first prompt, while a Neurosurgeon persona elicited only 3.5% - an 8.8-fold difference that emerged before any epistemic probing. Disclosure ranged from 2.8% to 73.6% across model families, with a 14B model reaching 39.4% while a 70B model produced just 4.1%. Model identity provided substantially larger improvement in fitting observations than parameter count ($ΔR_{adj}^{2}=0.359$ vs $0.018$). Reasoning variants showed heterogeneous effects: some exhibited up to 48.4 percentage points lower disclosure than their base instruction-tuned counterparts, while others maintained high transparency. An additional experiment demonstrated that explicit permission to disclose AI nature increased disclosure from 23.7% to 65.8%, revealing that suppression reflects instruction-following prioritization rather than capability limitations. Bayesian validation confirmed robustness to judge measurement error ($κ=0.908$). These patterns create trust calibration risks when users encounter the same model across professional contexts. Organizations cannot assume safety properties will transfer across deployment domains, requiring deliberate behavior design and empirical verification.
Similar Papers
Self-Transparency Failures in Expert-Persona LLMs: A Large-Scale Behavioral Audit
Artificial Intelligence
AI models hide when they are experts.
Self-Transparency Failures in Expert-Persona LLMs: A Large-Scale Behavioral Audit
Artificial Intelligence
Computers admit when they're faking expertise.
Prompting Science Report 4: Playing Pretend: Expert Personas Don't Improve Factual Accuracy
Computation and Language
Giving AI pretend jobs doesn't help it answer questions.