Identity-Robust Language Model Generation via Content Integrity Preservation
By: Miao Zhang , Kelly Chen , Md Mehrab Tanjim and more
Large Language Model (LLM) outputs often vary across user sociodemographic attributes, leading to disparities in factual accuracy, utility, and safety, even for objective questions where demographic information is irrelevant. Unlike prior work on stereotypical or representational bias, this paper studies identity-dependent degradation of core response quality. We show empirically that such degradation arises from biased generation behavior, despite factual knowledge being robustly encoded across identities. Motivated by this mismatch, we propose a lightweight, training-free framework for identity-robust generation that selectively neutralizes non-critical identity information while preserving semantically essential attributes, thus maintaining output content integrity. Experiments across four benchmarks and 18 sociodemographic identities demonstrate an average 77% reduction in identity-dependent bias compared to vanilla prompting and a 45% reduction relative to prompt-based defenses. Our work addresses a critical gap in mitigating the impact of user identity cues in prompts on core generation quality.
Similar Papers
Investigating Intersectional Bias in Large Language Models using Confidence Disparities in Coreference Resolution
Computation and Language
Finds AI unfairly favors some people over others.
Language Models Change Facts Based on the Way You Talk
Computation and Language
Computers give different advice based on who you are.
Adaptive Generation of Bias-Eliciting Questions for LLMs
Computers and Society
Finds unfairness in AI answers to real questions.