Downsized and Compromised?: Assessing the Faithfulness of Model Compression
By: Moumita Kamal, Douglas A. Talbert
Potential Business Impact:
Checks if smaller AI still acts like the big AI.
In real-world applications, computational constraints often require transforming large models into smaller, more efficient versions through model compression. While these techniques aim to reduce size and computational cost without sacrificing performance, their evaluations have traditionally focused on the trade-off between size and accuracy, overlooking the aspect of model faithfulness. This limited view is insufficient for high-stakes domains like healthcare, finance, and criminal justice, where compressed models must remain faithful to the behavior of their original counterparts. This paper presents a novel approach to evaluating faithfulness in compressed models, moving beyond standard metrics. We introduce and demonstrate a set of faithfulness metrics that capture how model behavior changes post-compression. Our contributions include introducing techniques to assess predictive consistency between the original and compressed models using model agreement, and applying chi-squared tests to detect statistically significant changes in predictive patterns across both the overall dataset and demographic subgroups, thereby exposing shifts that aggregate fairness metrics may obscure. We demonstrate our approaches by applying quantization and pruning to artificial neural networks (ANNs) trained on three diverse and socially meaningful datasets. Our findings show that high accuracy does not guarantee faithfulness, and our statistical tests detect subtle yet significant shifts that are missed by standard metrics, such as Accuracy and Equalized Odds. The proposed metrics provide a practical and more direct method for ensuring that efficiency gains through compression do not compromise the fairness or faithfulness essential for trustworthy AI.
Similar Papers
Compressed Models are NOT Trust-equivalent to Their Large Counterparts
Computation and Language
Checks if smaller AI models think like bigger ones.
Model Compression vs. Adversarial Robustness: An Empirical Study on Language Models for Code
Software Engineering
Makes AI code checkers less safe when smaller.
Compressed Feature Quality Assessment: Dataset and Baselines
CV and Pattern Recognition
Checks if compressed computer pictures are still good.