Efficient Protein Optimization via Structure-aware Hamiltonian Dynamics
By: Jiahao Wang, Shuangjia Zheng
Potential Business Impact:
Designs better proteins for medicine and science.
The ability to engineer optimized protein variants has transformative potential for biotechnology and medicine. Prior sequence-based optimization methods struggle with the high-dimensional complexities due to the epistasis effect and the disregard for structural constraints. To address this, we propose HADES, a Bayesian optimization method utilizing Hamiltonian dynamics to efficiently sample from a structure-aware approximated posterior. Leveraging momentum and uncertainty in the simulated physical movements, HADES enables rapid transition of proposals toward promising areas. A position discretization procedure is introduced to propose discrete protein sequences from such a continuous state system. The posterior surrogate is powered by a two-stage encoder-decoder framework to determine the structure and function relationships between mutant neighbors, consequently learning a smoothed landscape to sample from. Extensive experiments demonstrate that our method outperforms state-of-the-art baselines in in-silico evaluations across most metrics. Remarkably, our approach offers a unique advantage by leveraging the mutual constraints between protein structure and sequence, facilitating the design of protein sequences with similar structures and optimized properties. The code and data are publicly available at https://github.com/GENTEL-lab/HADES.
Similar Papers
Seek and You Shall Fold
Machine Learning (CS)
Creates protein shapes from experimental clues.
Boosting In-Silicon Directed Evolution with Fine-Tuned Protein Language Model and Tree Search
Artificial Intelligence
Designs new proteins by learning from nature's code.
Boosting In-Silicon Directed Evolution with Fine-Tuned Protein Language Model and Tree Search
Artificial Intelligence
Helps make new proteins faster and better.