Protein Language Model Zero-Shot Fitness Predictions are Improved by Inference-only Dropout
By: Aditya Ravuri, Neil D. Lawrence
Potential Business Impact:
Improves computer guesses about protein health.
Protein Language Models (PLMs) such as ESM2 have been shown to be capable of zero-shot prediction of critical scalar properties of proteins (fitness). In this work, we show that injecting a dropout layer at inference time between a PLM's featurizer/embedding layer and its transformer, and averaging its output akin to Monte-Carlo dropout increases zero-shot performance on a subset of the ProteinGym dataset. This is the case even when the model was not trained with dropouts to begin with, and does not require retraining or finetuning of the PLM. A dropout of 0.1 seems performant across all models.
Similar Papers
Exploring zero-shot structure-based protein fitness prediction
Quantitative Methods
Predicts how protein changes affect health.
ProteinZero: Self-Improving Protein Generation via Online Reinforcement Learning
Machine Learning (CS)
Designs better proteins, failing less often.
InstructPLM-mu: 1-Hour Fine-Tuning of ESM2 Beats ESM3 in Protein Mutation Predictions
Quantitative Methods
Makes protein predictions faster and cheaper.