Score: 0

Cross-Domain Malware Detection via Probability-Level Fusion of Lightweight Gradient Boosting Models

Published: August 30, 2025 | arXiv ID: 2509.00476v1

By: Omar Khalid Ali Mohamed

Potential Business Impact:

Finds hidden computer viruses better and faster.

Business Areas:
A/B Testing Data and Analytics

The escalating sophistication of malware necessitates robust detection mechanisms that generalize across diverse data sources. Traditional single-dataset models struggle with cross-domain generalization and often incur high computational costs. This paper presents a novel, lightweight framework for malware detection that employs probability-level fusion across three distinct datasets: EMBER (static features), API Call Sequences (behavioral features), and CIC Obfuscated Memory (memory patterns). Our method trains individual LightGBM classifiers on each dataset, selects top predictive features to ensure efficiency, and fuses their prediction probabilities using optimized weights determined via grid search. Extensive experiments demonstrate that our fusion approach achieves a macro F1-score of 0.823 on a cross-domain validation set, significantly outperforming individual models and providing superior generalization. The framework maintains low computational overhead, making it suitable for real-time deployment, and all code and data are provided for full reproducibility.

Country of Origin
πŸ‡ΈπŸ‡¦ Saudi Arabia

Page Count
5 pages

Category
Computer Science:
Cryptography and Security