Score: 0

Rethinking GNNs and Missing Features: Challenges, Evaluation and a Robust Solution

Published: January 8, 2026 | arXiv ID: 2601.04855v1

By: Francesco Ferrini , Veronica Lachi , Antonio Longa and more

Potential Business Impact:

Helps computers learn from messy, incomplete data.

Business Areas:
Predictive Analytics Artificial Intelligence, Data and Analytics, Software

Handling missing node features is a key challenge for deploying Graph Neural Networks (GNNs) in real-world domains such as healthcare and sensor networks. Existing studies mostly address relatively benign scenarios, namely benchmark datasets with (a) high-dimensional but sparse node features and (b) incomplete data generated under Missing Completely At Random (MCAR) mechanisms. For (a), we theoretically prove that high sparsity substantially limits the information loss caused by missingness, making all models appear robust and preventing a meaningful comparison of their performance. To overcome this limitation, we introduce one synthetic and three real-world datasets with dense, semantically meaningful features. For (b), we move beyond MCAR and design evaluation protocols with more realistic missingness mechanisms. Moreover, we provide a theoretical background to state explicit assumptions on the missingness process and analyze their implications for different methods. Building on this analysis, we propose GNNmim, a simple yet effective baseline for node classification with incomplete feature data. Experiments show that GNNmim is competitive with respect to specialized architectures across diverse datasets and missingness regimes.

Country of Origin
🇮🇹 Italy

Page Count
33 pages

Category
Computer Science:
Machine Learning (CS)