Score: 1

A Comprehensive Study of Supervised Machine Learning Models for Zero-Day Attack Detection: Analyzing Performance on Imbalanced Data

Published: December 7, 2025 | arXiv ID: 2512.07030v1

By: Zahra Lotfi, Mostafa Lotfi

Potential Business Impact:

Finds hidden computer attacks before they happen.

Business Areas:

Predictive Analytics Artificial Intelligence, Data and Analytics, Software

Among the various types of cyberattacks, identifying zero-day attacks is problematic because they are unknown to security systems as their pattern and characteristics do not match known blacklisted attacks. There are many Machine Learning (ML) models designed to analyze and detect network attacks, especially using supervised models. However, these models are designed to classify samples (normal and attacks) based on the patterns they learn during the training phase, so they perform inefficiently on unseen attacks. This research addresses this issue by evaluating five different supervised models to assess their performance and execution time in predicting zero-day attacks and find out which model performs accurately and quickly. The goal is to improve the performance of these supervised models by not only proposing a framework that applies grid search, dimensionality reduction and oversampling methods to overcome the imbalance problem, but also comparing the effectiveness of oversampling on ml model metrics, in particular the accuracy. To emulate attack detection in real life, this research applies a highly imbalanced data set and only exposes the classifiers to zero-day attacks during the testing phase, so the models are not trained to flag the zero-day attacks. Our results show that Random Forest (RF) performs best under both oversampling and non-oversampling conditions, this increased effectiveness comes at the cost of longer processing times. Therefore, we selected XG Boost (XGB) as the top model due to its fast and highly accurate performance in detecting zero-day attacks.

Enhancing IoT Cyber Attack Detection in the Presence of Highly Imbalanced Data

Machine Learning (CS)

Finds hidden internet dangers in busy networks.

15 May 2025 0

90%

Cyber Security Data Science: Machine Learning Methods and their Performance on Imbalanced Datasets

Machine Learning (CS)

Finds computer threats faster by trying different tricks.

7 May 2025 1

89%

Malware Classification from Memory Dumps Using Machine Learning, Transformers, and Large Language Models

Machine Learning (CS)

Finds bad computer programs faster and better.

4 Mar 2025 0

View PDF Login to Bookmark

Country of Origin

🇮🇷 🇨🇦 Canada, Iran

Page Count

13 pages

A Comprehensive Study of Supervised Machine Learning Models for Zero-Day Attack Detection: Analyzing Performance on Imbalanced Data

Finds hidden computer attacks before they happen.

Technical Abstract

Enhancing IoT Cyber Attack Detection in the Presence of Highly Imbalanced Data

Cyber Security Data Science: Machine Learning Methods and their Performance on Imbalanced Datasets

Malware Classification from Memory Dumps Using Machine Learning, Transformers, and Large Language Models