Boosting In-Silicon Directed Evolution with Fine-Tuned Protein Language Model and Tree Search
By: Yaodong Yang , Yang Wang , Jinpeng Li and more
Potential Business Impact:
Designs new proteins by learning from nature's code.
Protein evolution through amino acid sequence mutations is a cornerstone of life sciences. While current in-silicon directed evolution algorithms largely focus on designing heuristic search strategies, they overlook how to integrate the transformative protein language models, which encode rich evolutionary patterns, with reinforcement learning to learn to directly evolve proteins. To bridge this gap, we propose AlphaDE, a novel framework to optimize protein sequences by harnessing the innovative paradigms of large language models such as fine-tuning and test-time inference. First, AlphaDE fine-tunes pretrained protein language models using masked language modeling on homologous protein sequences to activate the evolutionary plausibility for the interested protein class. Second, AlphaDE introduces test-time inference based on Monte Carlo tree search, which effectively evolves proteins with evolutionary guidance from the fine-tuned protein language model. Extensive benchmark experiments show that AlphaDE remarkably outperforms previous state-of-the-art methods even with few-shot fine-tuning. A further case study demonstrates that AlphaDE supports condensing the protein sequence space of avGFP through computational evolution.
Similar Papers
Boosting In-Silicon Directed Evolution with Fine-Tuned Protein Language Model and Tree Search
Artificial Intelligence
Helps make new proteins faster and better.
Boosting In-Silicon Directed Evolution with Fine-Tuned Protein Language Model and Tree Search
Artificial Intelligence
**Designs new proteins by learning from nature's code.**
Swarms of Large Language Model Agents for Protein Sequence Design with Experimental Validation
Artificial Intelligence
Creates new proteins for medicine and materials.