Score: 0

APT-CGLP: Advanced Persistent Threat Hunting via Contrastive Graph-Language Pre-Training

Published: November 25, 2025 | arXiv ID: 2511.20290v1

By: Xuebo Qiu , Mingqi Lv , Yimei Zhang and more

Potential Business Impact:

Finds sneaky computer hackers using smart AI.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Provenance-based threat hunting identifies Advanced Persistent Threats (APTs) on endpoints by correlating attack patterns described in Cyber Threat Intelligence (CTI) with provenance graphs derived from system audit logs. A fundamental challenge in this paradigm lies in the modality gap -- the structural and semantic disconnect between provenance graphs and CTI reports. Prior work addresses this by framing threat hunting as a graph matching task: 1) extracting attack graphs from CTI reports, and 2) aligning them with provenance graphs. However, this pipeline incurs severe \textit{information loss} during graph extraction and demands intensive manual curation, undermining scalability and effectiveness. In this paper, we present APT-CGLP, a novel cross-modal APT hunting system via Contrastive Graph-Language Pre-training, facilitating end-to-end semantic matching between provenance graphs and CTI reports without human intervention. First, empowered by the Large Language Model (LLM), APT-CGLP mitigates data scarcity by synthesizing high-fidelity provenance graph-CTI report pairs, while simultaneously distilling actionable insights from noisy web-sourced CTIs to improve their operational utility. Second, APT-CGLP incorporates a tailored multi-objective training algorithm that synergizes contrastive learning with inter-modal masked modeling, promoting cross-modal attack semantic alignment at both coarse- and fine-grained levels. Extensive experiments on four real-world APT datasets demonstrate that APT-CGLP consistently outperforms state-of-the-art threat hunting baselines in terms of accuracy and efficiency.

An Automated Attack Investigation Approach Leveraging Threat-Knowledge-Augmented Large Language Models

Cryptography and Security

Finds hidden computer attacks and explains them clearly.

1 Sep 2025 2

89%

Distributed Temporal Graph Learning with Provenance for APT Detection in Supply Chains

Cryptography and Security

Finds sneaky computer attacks hidden in software.

3 Apr 2025 0

89%

Knowledge Transfer from LLMs to Provenance Analysis: A Semantic-Augmented Method for APT Detection

Cryptography and Security

Finds hidden computer attacks using smart AI.

24 Mar 2025 1

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

13 pages

APT-CGLP: Advanced Persistent Threat Hunting via Contrastive Graph-Language Pre-Training

Finds sneaky computer hackers using smart AI.

Technical Abstract

An Automated Attack Investigation Approach Leveraging Threat-Knowledge-Augmented Large Language Models

Distributed Temporal Graph Learning with Provenance for APT Detection in Supply Chains

Knowledge Transfer from LLMs to Provenance Analysis: A Semantic-Augmented Method for APT Detection