Topotein: Topological Deep Learning for Protein Representation Learning
By: Zhiyu Wang , Arian Jamasb , Mustafa Hajij and more
Potential Business Impact:
Helps scientists understand how proteins fold.
Protein representation learning (PRL) is crucial for understanding structure-function relationships, yet current sequence- and graph-based methods fail to capture the hierarchical organization inherent in protein structures. We introduce Topotein, a comprehensive framework that applies topological deep learning to PRL through the novel Protein Combinatorial Complex (PCC) and Topology-Complete Perceptron Network (TCPNet). Our PCC represents proteins at multiple hierarchical levels -- from residues to secondary structures to complete proteins -- while preserving geometric information at each level. TCPNet employs SE(3)-equivariant message passing across these hierarchical structures, enabling more effective capture of multi-scale structural patterns. Through extensive experiments on four PRL tasks, TCPNet consistently outperforms state-of-the-art geometric graph neural networks. Our approach demonstrates particular strength in tasks such as fold classification which require understanding of secondary structure arrangements, validating the importance of hierarchical topological features for protein analysis.
Similar Papers
Cross-View Topology-Aware Graph Representation Learning
Machine Learning (CS)
Helps computers understand complex data patterns better.
ProteinPNet: Prototypical Part Networks for Concept Learning in Spatial Proteomics
Machine Learning (CS)
Finds hidden patterns in cancer cells.
Topological Feature Compression for Molecular Graph Neural Networks
Machine Learning (CS)
Finds better ways to build new materials.