Are LLMs (Really) Ideological? An IRT-based Analysis and Alignment Tool for Perceived Socio-Economic Bias in LLMs
By: Jasmin Wachter , Michael Radloff , Maja Smolej and more
Potential Business Impact:
Finds unfairness in AI without asking people.
We introduce an Item Response Theory (IRT)-based framework to detect and quantify socioeconomic bias in large language models (LLMs) without relying on subjective human judgments. Unlike traditional methods, IRT accounts for item difficulty, improving ideological bias estimation. We fine-tune two LLM families (Meta-LLaMa 3.2-1B-Instruct and Chat- GPT 3.5) to represent distinct ideological positions and introduce a two-stage approach: (1) modeling response avoidance and (2) estimating perceived bias in answered responses. Our results show that off-the-shelf LLMs often avoid ideological engagement rather than exhibit bias, challenging prior claims of partisanship. This empirically validated framework enhances AI alignment research and promotes fairer AI governance.
Similar Papers
Decoding the Mind of Large Language Models: A Quantitative Evaluation of Ideology and Biases
Computation and Language
Finds AI's hidden opinions and unfair ideas.
Probing the Subtle Ideological Manipulation of Large Language Models
Computation and Language
Teaches computers to understand many political ideas.
Don't Change My View: Ideological Bias Auditing in Large Language Models
Computation and Language
Finds if AI is pushed to have certain opinions.