Robustness Feature Adapter for Efficient Adversarial Training
By: Quanwei Wu , Jun Guo , Wei Wang and more
Potential Business Impact:
Makes AI smarter and safer from tricks.
Adversarial training (AT) with projected gradient descent is the most popular method to improve model robustness under adversarial attacks. However, computational overheads become prohibitively large when AT is applied to large backbone models. AT is also known to have the issue of robust overfitting. This paper contributes to solving both problems simultaneously towards building more trustworthy foundation models. In particular, we propose a new adapter-based approach for efficient AT directly in the feature space. We show that the proposed adapter-based approach can improve the inner-loop convergence quality by eliminating robust overfitting. As a result, it significantly increases computational efficiency and improves model accuracy by generalizing adversarial robustness to unseen attacks. We demonstrate the effectiveness of the new adapter-based approach in different backbone architectures and in AT at scale.
Similar Papers
Identifying and Understanding Cross-Class Features in Adversarial Training
Machine Learning (CS)
Makes AI smarter and harder to trick.
Defense That Attacks: How Robust Models Become Better Attackers
CV and Pattern Recognition
Makes AI easier to trick with fake images.
Defense That Attacks: How Robust Models Become Better Attackers
CV and Pattern Recognition
Makes computer "eyes" easier for hackers to trick.