An Empirical Survey of Model Merging Algorithms for Social Bias Mitigation
By: Daiki Shirafuji, Tatsuhiko Saito, Yasutomo Kimura
Potential Business Impact:
Fixes AI bias, but can make AI less smart.
Large language models (LLMs) are known to inherit and even amplify societal biases present in their pre-training corpora, threatening fairness and social trust. To address this issue, recent work has explored ``editing'' LLM parameters to mitigate social bias with model merging approaches; however, there is no empirical comparison. In this work, we empirically survey seven algorithms: Linear, Karcher Mean, SLERP, NuSLERP, TIES, DELLA, and Nearswap, applying 13 open weight models in the GPT, LLaMA, and Qwen families. We perform a comprehensive evaluation using three bias datasets (BBQ, BOLD, and HONEST) and measure the impact of these techniques on LLM performance in downstream tasks of the SuperGLUE benchmark. We find a trade-off between bias reduction and downstream performance: methods achieving greater bias mitigation degrade accuracy, particularly on tasks requiring reading comprehension and commonsense and causal reasoning. Among the merging algorithms, Linear, SLERP, and Nearswap consistently reduce bias while maintaining overall performance, with SLERP at moderate interpolation weights emerging as the most balanced choice. These results highlight the potential of model merging algorithms for bias mitigation, while indicating that excessive debiasing or inappropriate merging methods may lead to the degradation of important linguistic abilities.
Similar Papers
A Novel Hierarchical Integration Method for Efficient Model Merging in Medical LLMs
Machine Learning (CS)
Combines medical AI knowledge without losing privacy.
A Systematic Study of Model Merging Techniques in Large Language Models
Computation and Language
Combines AI models to make them smarter without retraining.
Simulating a Bias Mitigation Scenario in Large Language Models
Computation and Language
Fixes computer language to be fair and honest.