When Personalization Misleads: Understanding and Mitigating Hallucinations in Personalized LLMs
By: Zhongxiang Sun , Yi Zhan , Chenglei Shen and more
Potential Business Impact:
Fixes AI that lies to users.
Personalized large language models (LLMs) adapt model behavior to individual users to enhance user satisfaction, yet personalization can inadvertently distort factual reasoning. We show that when personalized LLMs face factual queries, there exists a phenomenon where the model generates answers aligned with a user's prior history rather than the objective truth, resulting in personalization-induced hallucinations that degrade factual reliability and may propagate incorrect beliefs, due to representational entanglement between personalization and factual representations. To address this issue, we propose Factuality-Preserving Personalized Steering (FPPS), a lightweight inference-time approach that mitigates personalization-induced factual distortions while preserving personalized behavior. We further introduce PFQABench, the first benchmark designed to jointly evaluate factual and personalized question answering under personalization. Experiments across multiple LLM backbones and personalization methods show that FPPS substantially improves factual accuracy while maintaining personalized performance.
Similar Papers
Exploring the Generalizability of Factual Hallucination Mitigation via Enhancing Precise Knowledge Utilization
Computation and Language
Helps AI tell the truth, not make things up.
Fine-tuning Large Language Models for Improving Factuality in Legal Question Answering
Computation and Language
Makes AI answer legal questions truthfully and accurately.
Highlight All the Phrases: Enhancing LLM Transparency through Visual Factuality Indicators
Human-Computer Interaction
Colors show if AI is telling the truth.