APT-CGLP: Advanced Persistent Threat Hunting via Contrastive Graph-Language Pre-Training
By: Xuebo Qiu , Mingqi Lv , Yimei Zhang and more
Potential Business Impact:
Finds sneaky computer hackers using smart AI.
Provenance-based threat hunting identifies Advanced Persistent Threats (APTs) on endpoints by correlating attack patterns described in Cyber Threat Intelligence (CTI) with provenance graphs derived from system audit logs. A fundamental challenge in this paradigm lies in the modality gap -- the structural and semantic disconnect between provenance graphs and CTI reports. Prior work addresses this by framing threat hunting as a graph matching task: 1) extracting attack graphs from CTI reports, and 2) aligning them with provenance graphs. However, this pipeline incurs severe \textit{information loss} during graph extraction and demands intensive manual curation, undermining scalability and effectiveness. In this paper, we present APT-CGLP, a novel cross-modal APT hunting system via Contrastive Graph-Language Pre-training, facilitating end-to-end semantic matching between provenance graphs and CTI reports without human intervention. First, empowered by the Large Language Model (LLM), APT-CGLP mitigates data scarcity by synthesizing high-fidelity provenance graph-CTI report pairs, while simultaneously distilling actionable insights from noisy web-sourced CTIs to improve their operational utility. Second, APT-CGLP incorporates a tailored multi-objective training algorithm that synergizes contrastive learning with inter-modal masked modeling, promoting cross-modal attack semantic alignment at both coarse- and fine-grained levels. Extensive experiments on four real-world APT datasets demonstrate that APT-CGLP consistently outperforms state-of-the-art threat hunting baselines in terms of accuracy and efficiency.
Similar Papers
An Automated Attack Investigation Approach Leveraging Threat-Knowledge-Augmented Large Language Models
Cryptography and Security
Finds hidden computer attacks and explains them clearly.
Distributed Temporal Graph Learning with Provenance for APT Detection in Supply Chains
Cryptography and Security
Finds sneaky computer attacks hidden in software.
Knowledge Transfer from LLMs to Provenance Analysis: A Semantic-Augmented Method for APT Detection
Cryptography and Security
Finds hidden computer attacks using smart AI.