Atomic Diffusion Models for Small Molecule Structure Elucidation from NMR Spectra
By: Ziyu Xiong , Yichi Zhang , Foyez Alauddin and more
Potential Business Impact:
Finds drug structures automatically from simple tests.
Nuclear Magnetic Resonance (NMR) spectroscopy is a cornerstone technique for determining the structures of small molecules and is especially critical in the discovery of novel natural products and clinical therapeutics. Yet, interpreting NMR spectra remains a time-consuming, manual process requiring extensive domain expertise. We introduce ChefNMR (CHemical Elucidation From NMR), an end-to-end framework that directly predicts an unknown molecule's structure solely from its 1D NMR spectra and chemical formula. We frame structure elucidation as conditional generation from an atomic diffusion model built on a non-equivariant transformer architecture. To model the complex chemical groups found in natural products, we generated a dataset of simulated 1D NMR spectra for over 111,000 natural products. ChefNMR predicts the structures of challenging natural product compounds with an unsurpassed accuracy of over 65%. This work takes a significant step toward solving the grand challenge of automating small-molecule structure elucidation and highlights the potential of deep learning in accelerating molecular discovery. Code is available at https://github.com/ml-struct-bio/chefnmr.
Similar Papers
DiffSpectra: Molecular Structure Elucidation from Spectra using Diffusion Models
Machine Learning (CS)
**Finds molecule shapes from sound and light.**
Seek and You Shall Fold
Machine Learning (CS)
Creates protein shapes from experimental clues.
Test-Time Tuned Language Models Enable End-to-end De Novo Molecular Structure Generation from MS/MS Spectra
Artificial Intelligence
Finds new chemicals from their broken pieces.