Learning Robust Heterogeneous Graph Representations via Contrastive-Reconstruction under Sparse Semantics
By: Di Lin , Wanjing Ren , Xuanbin Li and more
Potential Business Impact:
Helps computers learn from messy, incomplete data.
In graph self-supervised learning, masked autoencoders (MAE) and contrastive learning (CL) are two prominent paradigms. MAE focuses on reconstructing masked elements, while CL maximizes similarity between augmented graph views. Recent studies highlight their complementarity: MAE excels at local feature capture, and CL at global information extraction. Hybrid frameworks for homogeneous graphs have been proposed, but face challenges in designing shared encoders to meet the semantic requirements of both tasks. In semantically sparse scenarios, CL struggles with view construction, and gradient imbalance between positive and negative samples persists. This paper introduces HetCRF, a novel dual-channel self-supervised learning framework for heterogeneous graphs. HetCRF uses a two-stage aggregation strategy to adapt embedding semantics, making it compatible with both MAE and CL. To address semantic sparsity, it enhances encoder output for view construction instead of relying on raw features, improving efficiency. Two positive sample augmentation strategies are also proposed to balance gradient contributions. Node classification experiments on four real-world heterogeneous graph datasets demonstrate that HetCRF outperforms state-of-the-art baselines. On datasets with missing node features, such as Aminer and Freebase, at a 40% label rate in node classification, HetCRF improves the Macro-F1 score by 2.75% and 2.2% respectively compared to the second-best baseline, validating its effectiveness and superiority.
Similar Papers
CORE: Contrastive Masked Feature Reconstruction on Graphs
Machine Learning (CS)
Helps computers understand online connections better.
Incorporating Attributes and Multi-Scale Structures for Heterogeneous Graph Contrastive Learning
Machine Learning (CS)
Teaches computers to understand complex relationships without labels.
HiTeC: Hierarchical Contrastive Learning on Text-Attributed Hypergraph with Semantic-Aware Augmentation
Machine Learning (CS)
Helps computers understand text and connections better.