Do Students Debias Like Teachers? On the Distillability of Bias Mitigation Methods
By: Jiali Cheng, Chirag Agarwal, Hadi Amiri
Potential Business Impact:
Makes AI less biased by teaching it better.
Knowledge distillation (KD) is an effective method for model compression and transferring knowledge between models. However, its effect on model's robustness against spurious correlations that degrade performance on out-of-distribution data remains underexplored. This study investigates the effect of knowledge distillation on the transferability of ``debiasing'' capabilities from teacher models to student models on natural language inference (NLI) and image classification tasks. Through extensive experiments, we illustrate several key findings: (i) overall the debiasing capability of a model is undermined post-KD; (ii) training a debiased model does not benefit from injecting teacher knowledge; (iii) although the overall robustness of a model may remain stable post-distillation, significant variations can occur across different types of biases; and (iv) we pin-point the internal attention pattern and circuit that causes the distinct behavior post-KD. Given the above findings, we propose three effective solutions to improve the distillability of debiasing methods: developing high quality data for augmentation, implementing iterative knowledge distillation, and initializing student models with weights obtained from teacher models. To the best of our knowledge, this is the first study on the effect of KD on debiasing and its interenal mechanism at scale. Our findings provide understandings on how KD works and how to design better debiasing methods.
Similar Papers
The Role of Teacher Calibration in Knowledge Distillation
Machine Learning (CS)
Makes smaller computer brains learn better from big ones.
Revisiting Knowledge Distillation: The Hidden Role of Dataset Size
Machine Learning (CS)
Makes AI learn better with less data.
Feature Alignment and Representation Transfer in Knowledge Distillation for Large Language Models
Computation and Language
Makes smart computer programs smaller and faster.