GradID: Adversarial Detection via Intrinsic Dimensionality of Gradients
By: Mohammad Mahdi Razmjoo, Mohammad Mahdi Sharifian, Saeed Bagheri Shouraki
Potential Business Impact:
Finds fake data fooling smart computer programs.
Despite their remarkable performance, deep neural networks exhibit a critical vulnerability: small, often imperceptible, adversarial perturbations can lead to drastically altered model predictions. Given the stringent reliability demands of applications such as medical diagnosis and autonomous driving, robust detection of such adversarial attacks is paramount. In this paper, we investigate the geometric properties of a model's input loss landscape. We analyze the Intrinsic Dimensionality (ID) of the model's gradient parameters, which quantifies the minimal number of coordinates required to describe the data points on their underlying manifold. We reveal a distinct and consistent difference in the ID for natural and adversarial data, which forms the basis of our proposed detection method. We validate our approach across two distinct operational scenarios. First, in a batch-wise context for identifying malicious data groups, our method demonstrates high efficacy on datasets like MNIST and SVHN. Second, in the critical individual-sample setting, we establish new state-of-the-art results on challenging benchmarks such as CIFAR-10 and MS COCO. Our detector significantly surpasses existing methods against a wide array of attacks, including CW and AutoAttack, achieving detection rates consistently above 92\% on CIFAR-10. The results underscore the robustness of our geometric approach, highlighting that intrinsic dimensionality is a powerful fingerprint for adversarial detection across diverse datasets and attack strategies.
Similar Papers
Intrinsic Dimensionality as a Model-Free Measure of Class Imbalance
Machine Learning (CS)
Measures data problems better for smarter computers.
Intrinsic Dimension Estimating Autoencoder (IDEA) Using CancelOut Layer and a Projected Loss
Machine Learning (CS)
Finds hidden patterns in complex data.
Intrinsic Dimension Estimating Autoencoder (IDEA) Using CancelOut Layer and a Projected Loss
Machine Learning (CS)
Finds hidden patterns, then rebuilds data.