Guide your favorite protein sequence generative model
By: Junhao Xiong , Hunter Nisonoff , Maria Lukarska and more
Potential Business Impact:
Designs new proteins with special abilities.
Generative machine learning models on sequences are transforming protein engineering. However, no principled framework exists for conditioning these models on auxiliary information, such as experimental data, in a plug-and-play manner. Herein, we present ProteinGuide -- a principled and general method for conditioning -- by unifying a broad class of protein generative models under a single framework. We demonstrate the applicability of ProteinGuide by guiding two protein generative models, ProteinMPNN and ESM3, to generate amino acid and structure token sequences, conditioned on several user-specified properties such as enhanced stability, enzyme classes, and CATH-labeled folds. We also used ProteinGuide with inverse folding models and our own experimental assay to design adenine base editor sequences for high activity.
Similar Papers
ProteinZero: Self-Improving Protein Generation via Online Reinforcement Learning
Machine Learning (CS)
Designs better proteins, failing less often.
Seek and You Shall Fold
Machine Learning (CS)
Creates protein shapes from experimental clues.
Exploring zero-shot structure-based protein fitness prediction
Quantitative Methods
Predicts how protein changes affect health.