Score: 3

Topotein: Topological Deep Learning for Protein Representation Learning

Published: September 4, 2025 | arXiv ID: 2509.03885v1

By: Zhiyu Wang , Arian Jamasb , Mustafa Hajij and more

Potential Business Impact:

Helps scientists understand how proteins fold.

Business Areas:
Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Protein representation learning (PRL) is crucial for understanding structure-function relationships, yet current sequence- and graph-based methods fail to capture the hierarchical organization inherent in protein structures. We introduce Topotein, a comprehensive framework that applies topological deep learning to PRL through the novel Protein Combinatorial Complex (PCC) and Topology-Complete Perceptron Network (TCPNet). Our PCC represents proteins at multiple hierarchical levels -- from residues to secondary structures to complete proteins -- while preserving geometric information at each level. TCPNet employs SE(3)-equivariant message passing across these hierarchical structures, enabling more effective capture of multi-scale structural patterns. Through extensive experiments on four PRL tasks, TCPNet consistently outperforms state-of-the-art geometric graph neural networks. Our approach demonstrates particular strength in tasks such as fold classification which require understanding of secondary structure arrangements, validating the importance of hierarchical topological features for protein analysis.

Country of Origin
🇺🇸 🇬🇧 United States, United Kingdom

Repos / Data Links

Page Count
22 pages

Category
Computer Science:
Machine Learning (CS)