Explainable but Vulnerable: Adversarial Attacks on XAI Explanation in Cybersecurity Applications
By: Maraz Mia, Mir Mehedi A. Pritom
Potential Business Impact:
Makes AI explanations harder to trick.
Explainable Artificial Intelligence (XAI) has aided machine learning (ML) researchers with the power of scrutinizing the decisions of the black-box models. XAI methods enable looking deep inside the models' behavior, eventually generating explanations along with a perceived trust and transparency. However, depending on any specific XAI method, the level of trust can vary. It is evident that XAI methods can themselves be a victim of post-adversarial attacks that manipulate the expected outcome from the explanation module. Among such attack tactics, fairwashing explanation (FE), manipulation explanation (ME), and backdoor-enabled manipulation attacks (BD) are the notable ones. In this paper, we try to understand these adversarial attack techniques, tactics, and procedures (TTPs) on explanation alteration and thus the effect on the model's decisions. We have explored a total of six different individual attack procedures on post-hoc explanation methods such as SHAP (SHapley Additive exPlanations), LIME (Local Interpretable Model-agnostic Explanation), and IG (Integrated Gradients), and investigated those adversarial attacks in cybersecurity applications scenarios such as phishing, malware, intrusion, and fraudulent website detection. Our experimental study reveals the actual effectiveness of these attacks, thus providing an urgency for immediate attention to enhance the resiliency of XAI methods and their applications.
Similar Papers
eXIAA: eXplainable Injections for Adversarial Attack
Machine Learning (CS)
Tricks AI into showing wrong reasons for its choices.
Robust Intrusion Detection System with Explainable Artificial Intelligence
Cryptography and Security
Stops sneaky computer tricks from breaking networks.
A Grey-box Text Attack Framework using Explainable AI
Computation and Language
Makes AI mistakes hidden from humans.