Malware Detection based on API Calls: A Reproducibility Study
By: Juhani Merilehto
Potential Business Impact:
Finds computer viruses by looking at how programs work.
This study independently reproduces the malware detection methodology presented by Felli cious et al. [7], which employs order-invariant API call frequency analysis using Random Forest classification. We utilized the original public dataset (250,533 training samples, 83,511 test samples) and replicated four model variants: Unigram, Bigram, Trigram, and Combined n gram approaches. Our reproduction successfully validated all key findings, achieving F1-scores that exceeded the original results by 0.99% to 2.57% across all models at the optimal API call length of 2,500. The Unigram model achieved F1=0.8717 (original: 0.8631), confirming its ef fectiveness as a lightweight malware detector. Across three independent experimental runs with different random seeds, we observed remarkably consistent results with standard deviations be low 0.5%, demonstrating high reproducibility. This study validates the robustness and scientific rigor of the original methodology while confirming the practical viability of frequency-based API call analysis for malware detection.
Similar Papers
Malware Detection based on API calls
Cryptography and Security
Finds computer viruses by looking at how programs work.
Zipf-Gramming: Scaling Byte N-Grams Up to Production Sized Malware Corpora
Cryptography and Security
Finds new computer viruses much faster.
A Hybrid Deep Learning and Anomaly Detection Framework for Real-Time Malicious URL Classification
Cryptography and Security
Stops bad websites from tricking you online.