Score: 1

Why does your graph neural network fail on some graphs? Insights from exact generalisation error

Published: September 12, 2025 | arXiv ID: 2509.10337v1

By: Nil Ayday, Mahalakshmi Sabanayagam, Debarghya Ghoshdastidar

Potential Business Impact:

Explains why computer "brains" learn from connected data.

Business Areas:
Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Graph Neural Networks (GNNs) are widely used in learning on graph-structured data, yet a principled understanding of why they succeed or fail remains elusive. While prior works have examined architectural limitations such as over-smoothing and over-squashing, these do not explain what enables GNNs to extract meaningful representations or why performance varies drastically between similar architectures. These questions are related to the role of generalisation: the ability of a model to make accurate predictions on unlabelled data. Although several works have derived generalisation error bounds for GNNs, these are typically loose, restricted to a single architecture, and offer limited insight into what governs generalisation in practice. In this work, we take a different approach by deriving the exact generalisation error for GNNs in a transductive fixed-design setting through the lens of signal processing. From this viewpoint, GNNs can be interpreted as graph filter operators that act on node features via the graph structure. By focusing on linear GNNs while allowing non-linearity in the graph filters, we derive the first exact generalisation error for a broad range of GNNs, including convolutional, PageRank-based, and attention-based models. The exact characterisation of the generalisation error reveals that only the aligned information between node features and graph structure contributes to generalisation. Furthermore, we quantify the effect of homophily on generalisation. Our work provides a framework that explains when and why GNNs can effectively leverage structural and feature information, offering practical guidance for model selection.

Country of Origin
🇩🇪 Germany

Page Count
29 pages

Category
Statistics:
Machine Learning (Stat)