Score: 0

Stable and Explainable Personality Trait Evaluation in Large Language Models with Internal Activations

Published: January 14, 2026 | arXiv ID: 2601.09833v1

By: Xiaoxu Ma, Xiangbo Zhang, Zhenyu Weng

Evaluating personality traits in Large Language Models (LLMs) is key to model interpretation, comparison, and responsible deployment. However, existing questionnaire-based evaluation methods exhibit limited stability and offer little explainability, as their results are highly sensitive to minor variations in prompt phrasing or role-play configurations. To address these limitations, we propose an internal-activation-based approach, termed Persona-Vector Neutrality Interpolation (PVNI), for stable and explainable personality trait evaluation in LLMs. PVNI extracts a persona vector associated with a target personality trait from the model's internal activations using contrastive prompts. It then estimates the corresponding neutral score by interpolating along the persona vector as an anchor axis, enabling an interpretable comparison between the neutral prompt representation and the persona direction. We provide a theoretical analysis of the effectiveness and generalization properties of PVNI. Extensive experiments across diverse LLMs demonstrate that PVNI yields substantially more stable personality trait evaluations than existing methods, even under questionnaire and role-play variants.

Personality as a Probe for LLM Evaluation: Method Trade-offs and Downstream Effects

Computation and Language

Changes AI personality to be more helpful.

5 Sep 2025 0

89%

Evaluating Personality Traits in Large Language Models: Insights from Psychological Questionnaires

Computation and Language

Computers show human-like personality traits.

7 Feb 2025 0

89%

Persona Vectors: Monitoring and Controlling Character Traits in Language Models

Computation and Language

Controls AI's personality to be helpful and honest.

29 Jul 2025 2

View PDF Login to Bookmark

Stable and Explainable Personality Trait Evaluation in Large Language Models with Internal Activations

Technical Abstract

Personality as a Probe for LLM Evaluation: Method Trade-offs and Downstream Effects

Evaluating Personality Traits in Large Language Models: Insights from Psychological Questionnaires

Persona Vectors: Monitoring and Controlling Character Traits in Language Models