Rethinking GNNs and Missing Features: Challenges, Evaluation and a Robust Solution
By: Francesco Ferrini , Veronica Lachi , Antonio Longa and more
Potential Business Impact:
Helps computers learn from messy, incomplete data.
Handling missing node features is a key challenge for deploying Graph Neural Networks (GNNs) in real-world domains such as healthcare and sensor networks. Existing studies mostly address relatively benign scenarios, namely benchmark datasets with (a) high-dimensional but sparse node features and (b) incomplete data generated under Missing Completely At Random (MCAR) mechanisms. For (a), we theoretically prove that high sparsity substantially limits the information loss caused by missingness, making all models appear robust and preventing a meaningful comparison of their performance. To overcome this limitation, we introduce one synthetic and three real-world datasets with dense, semantically meaningful features. For (b), we move beyond MCAR and design evaluation protocols with more realistic missingness mechanisms. Moreover, we provide a theoretical background to state explicit assumptions on the missingness process and analyze their implications for different methods. Building on this analysis, we propose GNNmim, a simple yet effective baseline for node classification with incomplete feature data. Experiments show that GNNmim is competitive with respect to specialized architectures across diverse datasets and missingness regimes.
Similar Papers
Feature-Enhanced Graph Neural Networks for Classification of Synthetic Graph Generative Models: A Benchmarking Study
Machine Learning (CS)
Teaches computers to tell different kinds of fake networks apart.
Model-Agnostic Fairness Regularization for GNNs with Incomplete Sensitive Information
Machine Learning (CS)
Makes computer learning fairer for everyone.
The Impact of Data Characteristics on GNN Evaluation for Detecting Fake News
Machine Learning (CS)
Makes fake news detectors work better on real data.