Graph Synthetic Out-of-Distribution Exposure with Large Language Models
By: Haoyan Xu , Zhengtao Yao , Ziyi Wang and more
Potential Business Impact:
Finds fake data in computer networks.
Out-of-distribution (OOD) detection in graphs is critical for ensuring model robustness in open-world and safety-sensitive applications. Existing graph OOD detection approaches typically train an in-distribution (ID) classifier on ID data alone, then apply post-hoc scoring to detect OOD instances. While OOD exposure - adding auxiliary OOD samples during training - can improve detection, current graph-based methods often assume access to real OOD nodes, which is often impractical or costly. In this paper, we present GOE-LLM, a framework that leverages Large Language Models (LLMs) to achieve OOD exposure on text-attributed graphs without using any real OOD nodes. GOE-LLM introduces two pipelines: (1) identifying pseudo-OOD nodes from the initially unlabeled graph using zero-shot LLM annotations, and (2) generating semantically informative synthetic OOD nodes via LLM-prompted text generation. These pseudo-OOD nodes are then used to regularize ID classifier training and enhance OOD detection awareness. Empirical results on multiple benchmarks show that GOE-LLM substantially outperforms state-of-the-art methods without OOD exposure, achieving up to a 23.5% improvement in AUROC for OOD detection, and attains performance on par with those relying on real OOD labels for exposure.
Similar Papers
GLIP-OOD: Zero-Shot Graph OOD Detection with Graph Foundation Model
Machine Learning (CS)
Helps computers spot fake data in networks.
Large Language Model Enhanced Graph Invariant Contrastive Learning for Out-of-Distribution Recommendation
Information Retrieval
Helps movie suggestions work even with new users.
Polysemantic Dropout: Conformal OOD Detection for Specialized LLMs
Computation and Language
Keeps AI from making mistakes on new topics.