Revealing Potential Biases in LLM-Based Recommender Systems in the Cold Start Setting
By: Alexandre Andre , Gauthier Roy , Eva Dyer and more
Potential Business Impact:
Finds unfairness in computer suggestions.
Large Language Models (LLMs) are increasingly used for recommendation tasks due to their general-purpose capabilities. While LLMs perform well in rich-context settings, their behavior in cold-start scenarios, where only limited signals such as age, gender, or language are available, raises fairness concerns because they may rely on societal biases encoded during pretraining. We introduce a benchmark specifically designed to evaluate fairness in zero-context recommendation. Our modular pipeline supports configurable recommendation domains and sensitive attributes, enabling systematic and flexible audits of any open-source LLM. Through evaluations of state-of-the-art models (Gemma 3 and Llama 3.2), we uncover consistent biases across recommendation domains (music, movies, and colleges) including gendered and cultural stereotypes. We also reveal a non-linear relationship between model size and fairness, highlighting the need for nuanced analysis.
Similar Papers
Revealing Potential Biases in LLM-Based Recommender Systems in the Cold Start Setting
Information Retrieval
Finds unfairness in computer suggestions.
Where Should I Study? Biased Language Models Decide! Evaluating Fairness in LMs for Academic Recommendations
Computation and Language
AI unfairly favors rich countries and men.
Evaluating Position Bias in Large Language Model Recommendations
Information Retrieval
Fixes computer suggestions so order doesn't matter.