HiGraph: A Large-Scale Hierarchical Graph Dataset for Malware Analysis
By: Han Chen , Hanchen Wang , Hongmei Chen and more
Potential Business Impact:
Helps computers spot bad software better.
The advancement of graph-based malware analysis is critically limited by the absence of large-scale datasets that capture the inherent hierarchical structure of software. Existing methods often oversimplify programs into single level graphs, failing to model the crucial semantic relationship between high-level functional interactions and low-level instruction logic. To bridge this gap, we introduce \dataset, the largest public hierarchical graph dataset for malware analysis, comprising over \textbf{200M} Control Flow Graphs (CFGs) nested within \textbf{595K} Function Call Graphs (FCGs). This two-level representation preserves structural semantics essential for building robust detectors resilient to code obfuscation and malware evolution. We demonstrate HiGraph's utility through a large-scale analysis that reveals distinct structural properties of benign and malicious software, establishing it as a foundational benchmark for the community. The dataset and tools are publicly available at https://higraph.org.
Similar Papers
Better Call Graphs: A New Dataset of Function Call Graphs for Malware Classification
Cryptography and Security
Creates better tools to find phone viruses.
Interactive Hypergraph Visual Analytics for Exploring Large and Complex Image Collections
Graphics
Shows hidden links between many pictures.
Mitigating Distribution Shift in Graph-Based Android Malware Classification via Function Metadata and LLM Embeddings
Cryptography and Security
Finds hidden computer virus patterns better.