Can LLMs Infer Personality from Real World Conversations?
By: Jianfeng Zhu, Ruoming Jin, Karin G. Coifman
Potential Business Impact:
Computers can't accurately guess your personality yet.
Large Language Models (LLMs) such as OpenAI's GPT-4 and Meta's LLaMA offer a promising approach for scalable personality assessment from open-ended language. However, inferring personality traits remains challenging, and earlier work often relied on synthetic data or social media text lacking psychometric validity. We introduce a real-world benchmark of 555 semi-structured interviews with BFI-10 self-report scores for evaluating LLM-based personality inference. Three state-of-the-art LLMs (GPT-4.1 Mini, Meta-LLaMA, and DeepSeek) were tested using zero-shot prompting for BFI-10 item prediction and both zero-shot and chain-of-thought prompting for Big Five trait inference. All models showed high test-retest reliability, but construct validity was limited: correlations with ground-truth scores were weak (max Pearson's $r = 0.27$), interrater agreement was low (Cohen's $\kappa < 0.10$), and predictions were biased toward moderate or high trait levels. Chain-of-thought prompting and longer input context modestly improved distributional alignment, but not trait-level accuracy. These results underscore limitations in current LLM-based personality inference and highlight the need for evidence-based development for psychological applications.
Similar Papers
Investigating Large Language Models in Inferring Personality Traits from User Conversations
Computation and Language
AI guesses your personality from talking.
Evaluating LLM Alignment on Personality Inference from Real-World Interview Data
Computation and Language
Computers can't guess your personality from talking.
Mind Reading or Misreading? LLMs on the Big Five Personality Test
Computation and Language
Helps computers guess your personality from writing.