Score: 0

Aligning Proteins and Language: A Foundation Model for Protein Retrieval

Published: May 27, 2025 | arXiv ID: 2506.08023v1

By: Qifeng Wu , Zhengzhe Liu , Han Zhu and more

Potential Business Impact:

Finds protein jobs from their shapes.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

This paper aims to retrieve proteins with similar structures and semantics from large-scale protein dataset, facilitating the functional interpretation of protein structures derived by structural determination methods like cryo-Electron Microscopy (cryo-EM). Motivated by the recent progress of vision-language models (VLMs), we propose a CLIP-style framework for aligning 3D protein structures with functional annotations using contrastive learning. For model training, we propose a large-scale dataset of approximately 200,000 protein-caption pairs with rich functional descriptors. We evaluate our model in both in-domain and more challenging cross-database retrieval on Protein Data Bank (PDB) and Electron Microscopy Data Bank (EMDB) dataset, respectively. In both cases, our approach demonstrates promising zero-shot retrieval performance, highlighting the potential of multimodal foundation models for structure-function understanding in protein biology.