Reducing Instability in Synthetic Data Evaluation with a Super-Metric in MalDataGen
By: Anna Luiza Gomes da Silva , Diego Kreutz , Angelo Diniz and more
Potential Business Impact:
Makes fake virus data better for training phone security.
Evaluating the quality of synthetic data remains a persistent challenge in the Android malware domain due to instability and the lack of standardization among existing metrics. This work integrates into MalDataGen a Super-Metric that aggregates eight metrics across four fidelity dimensions, producing a single weighted score. Experiments involving ten generative models and five balanced datasets demonstrate that the Super-Metric is more stable and consistent than traditional metrics, exhibiting stronger correlations with the actual performance of classifiers.
Similar Papers
Synthetic Data: AI's New Weapon Against Android Malware
Cryptography and Security
Creates fake malware to train phone security.
MalDataGen: A Modular Framework for Synthetic Tabular Data Generation in Malware Detection
Cryptography and Security
Creates fake computer virus data to train defenses.
ThreatIntel-Andro: Expert-Verified Benchmarking for Robust Android Malware Research
Cryptography and Security
Finds bad phone apps to protect computers.