Score: 0

Mitigating Distribution Shift in Graph-Based Android Malware Classification via Function Metadata and LLM Embeddings

Published: August 8, 2025 | arXiv ID: 2508.06734v1

By: Ngoc N. Tran , Anwar Said , Waseem Abbas and more

Potential Business Impact:

Finds hidden computer virus patterns better.

Graph-based malware classifiers can achieve over 94% accuracy on standard Android datasets, yet we find they suffer accuracy drops of up to 45% when evaluated on previously unseen malware variants from the same family - a scenario where strong generalization would typically be expected. This highlights a key limitation in existing approaches: both the model architectures and their structure-only representations often fail to capture deeper semantic patterns. In this work, we propose a robust semantic enrichment framework that enhances function call graphs with contextual features, including function-level metadata and, when available, code embeddings derived from large language models. The framework is designed to operate under real-world constraints where feature availability is inconsistent, and supports flexible integration of semantic signals. To evaluate generalization under realistic domain and temporal shifts, we introduce two new benchmarks: MalNet-Tiny-Common and MalNet-Tiny-Distinct, constructed using malware family partitioning to simulate cross-family generalization and evolving threat behavior. Experiments across multiple graph neural network backbones show that our method improves classification performance by up to 8% under distribution shift and consistently enhances robustness when integrated with adaptation-based methods. These results offer a practical path toward building resilient malware detection systems in evolving threat environments.

Beyond Detection: A Comprehensive Benchmark and Study on Representation Learning for Fine-Grained Webshell Family Classification

Cryptography and Security

Identifies computer virus types automatically.

4 Dec 2025 0

87%

MalVis: A Large-Scale Image-Based Framework and Dataset for Advancing Android Malware Classification

Cryptography and Security

Finds hidden phone viruses by seeing patterns.

17 May 2025 2

87%

DeepTrust: Multi-Step Classification through Dissimilar Adversarial Representations for Robust Android Malware Detection

Cryptography and Security

Stops bad apps from tricking phone security.

14 Oct 2025 2

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

13 pages

Mitigating Distribution Shift in Graph-Based Android Malware Classification via Function Metadata and LLM Embeddings

Finds hidden computer virus patterns better.

Technical Abstract

Beyond Detection: A Comprehensive Benchmark and Study on Representation Learning for Fine-Grained Webshell Family Classification

MalVis: A Large-Scale Image-Based Framework and Dataset for Advancing Android Malware Classification

DeepTrust: Multi-Step Classification through Dissimilar Adversarial Representations for Robust Android Malware Detection