Fused Gromov-Wasserstein Contrastive Learning for Effective Enzyme-Reaction Screening
By: Gengmo Zhou , Feng Yu , Wenda Wang and more
Enzymes are crucial catalysts that enable a wide range of biochemical reactions. Efficiently identifying specific enzymes from vast protein libraries is essential for advancing biocatalysis. Traditional computational methods for enzyme screening and retrieval are time-consuming and resource-intensive. Recently, deep learning approaches have shown promise. However, these methods focus solely on the interaction between enzymes and reactions, overlooking the inherent hierarchical relationships within each domain. To address these limitations, we introduce FGW-CLIP, a novel contrastive learning framework based on optimizing the fused Gromov-Wasserstein distance. FGW-CLIP incorporates multiple alignments, including inter-domain alignment between reactions and enzymes and intra-domain alignment within enzymes and reactions. By introducing a tailored regularization term, our method minimizes the Gromov-Wasserstein distance between enzyme and reaction spaces, which enhances information integration across these domains. Extensive evaluations demonstrate the superiority of FGW-CLIP in challenging enzyme-reaction tasks. On the widely-used EnzymeMap benchmark, FGW-CLIP achieves state-of-the-art performance in enzyme virtual screening, as measured by BEDROC and EF metrics. Moreover, FGW-CLIP consistently outperforms across all three splits of ReactZyme, the largest enzyme-reaction benchmark, demonstrating robust generalization to novel enzymes and reactions. These results position FGW-CLIP as a promising framework for enzyme discovery in complex biochemical settings, with strong adaptability across diverse screening scenarios.
Similar Papers
Transition States Energies from Machine Learning: An Application to Reverse Water-Gas Shift on Single-Atom Alloys
Materials Science
Finds better materials for chemical reactions.
Multimodal Quantum Vision Transformer for Enzyme Commission Classification from Biochemical Representations
Machine Learning (CS)
Helps scientists guess what tiny body helpers do.
Contrastive Multi-Task Learning with Solvent-Aware Augmentation for Drug Discovery
Biomolecules
Predicts drug-protein bonds more accurately in liquids