Cross-Domain Malware Detection via Probability-Level Fusion of Lightweight Gradient Boosting Models
By: Omar Khalid Ali Mohamed
Potential Business Impact:
Finds hidden computer viruses better and faster.
The escalating sophistication of malware necessitates robust detection mechanisms that generalize across diverse data sources. Traditional single-dataset models struggle with cross-domain generalization and often incur high computational costs. This paper presents a novel, lightweight framework for malware detection that employs probability-level fusion across three distinct datasets: EMBER (static features), API Call Sequences (behavioral features), and CIC Obfuscated Memory (memory patterns). Our method trains individual LightGBM classifiers on each dataset, selects top predictive features to ensure efficiency, and fuses their prediction probabilities using optimized weights determined via grid search. Extensive experiments demonstrate that our fusion approach achieves a macro F1-score of 0.823 on a cross-domain validation set, significantly outperforming individual models and providing superior generalization. The framework maintains low computational overhead, making it suitable for real-time deployment, and all code and data are provided for full reproducibility.
Similar Papers
Evaluating Ensemble and Deep Learning Models for Static Malware Detection with Dimensionality Reduction Using the EMBER Dataset
Cryptography and Security
Finds computer viruses faster and more reliably.
Enhancing Decision-Making in Windows PE Malware Classification During Dataset Shifts with Uncertainty Estimation
Cryptography and Security
Makes computer virus checkers more trustworthy.
Malware Classification from Memory Dumps Using Machine Learning, Transformers, and Large Language Models
Machine Learning (CS)
Finds bad computer programs faster and better.