ROBAD: Robust Adversary-aware Local-Global Attended Bad Actor Detection Sequential Model
By: Bing He, Mustaque Ahamad, Srijan Kumar
Potential Business Impact:
Finds fake users even when they try to hide.
Detecting bad actors is critical to ensure the safety and integrity of internet platforms. Several deep learning-based models have been developed to identify such users. These models should not only accurately detect bad actors, but also be robust against adversarial attacks that aim to evade detection. However, past deep learning-based detection models do not meet the robustness requirement because they are sensitive to even minor changes in the input sequence. To address this issue, we focus on (1) improving the model understanding capability and (2) enhancing the model knowledge such that the model can recognize potential input modifications when making predictions. To achieve these goals, we create a novel transformer-based classification model, called ROBAD (RObust adversary-aware local-global attended Bad Actor Detection model), which uses the sequence of user posts to generate user embedding to detect bad actors. Particularly, ROBAD first leverages the transformer encoder block to encode each post bidirectionally, thus building a post embedding to capture the local information at the post level. Next, it adopts the transformer decoder block to model the sequential pattern in the post embeddings by using the attention mechanism, which generates the sequence embedding to obtain the global information at the sequence level. Finally, to enrich the knowledge of the model, embeddings of modified sequences by mimicked attackers are fed into a contrastive-learning-enhanced classification layer for sequence prediction. In essence, by capturing the local and global information (i.e., the post and sequence information) and leveraging the mimicked behaviors of bad actors in training, ROBAD can be robust to adversarial attacks. Extensive experiments on Yelp and Wikipedia datasets show that ROBAD can effectively detect bad actors when under state-of-the-art adversarial attacks.
Similar Papers
Adversarially Robust Detection of Harmful Online Content: A Computational Design Science Approach
Machine Learning (CS)
Finds bad online words even when changed.
Explainable Transformer-Based Email Phishing Classification with Adversarial Robustness
Cryptography and Security
Catches tricky fake emails, even AI ones.
Leveraging large language models for SQL behavior-based database intrusion detection
Cryptography and Security
Finds sneaky people trying to steal data.