A Comprehensive Study of Supervised Machine Learning Models for Zero-Day Attack Detection: Analyzing Performance on Imbalanced Data
By: Zahra Lotfi, Mostafa Lotfi
Potential Business Impact:
Finds hidden computer attacks before they happen.
Among the various types of cyberattacks, identifying zero-day attacks is problematic because they are unknown to security systems as their pattern and characteristics do not match known blacklisted attacks. There are many Machine Learning (ML) models designed to analyze and detect network attacks, especially using supervised models. However, these models are designed to classify samples (normal and attacks) based on the patterns they learn during the training phase, so they perform inefficiently on unseen attacks. This research addresses this issue by evaluating five different supervised models to assess their performance and execution time in predicting zero-day attacks and find out which model performs accurately and quickly. The goal is to improve the performance of these supervised models by not only proposing a framework that applies grid search, dimensionality reduction and oversampling methods to overcome the imbalance problem, but also comparing the effectiveness of oversampling on ml model metrics, in particular the accuracy. To emulate attack detection in real life, this research applies a highly imbalanced data set and only exposes the classifiers to zero-day attacks during the testing phase, so the models are not trained to flag the zero-day attacks. Our results show that Random Forest (RF) performs best under both oversampling and non-oversampling conditions, this increased effectiveness comes at the cost of longer processing times. Therefore, we selected XG Boost (XGB) as the top model due to its fast and highly accurate performance in detecting zero-day attacks.
Similar Papers
Enhancing IoT Cyber Attack Detection in the Presence of Highly Imbalanced Data
Machine Learning (CS)
Finds hidden internet dangers in busy networks.
Cyber Security Data Science: Machine Learning Methods and their Performance on Imbalanced Datasets
Machine Learning (CS)
Finds computer threats faster by trying different tricks.
Malware Classification from Memory Dumps Using Machine Learning, Transformers, and Large Language Models
Machine Learning (CS)
Finds bad computer programs faster and better.