DiSE: A diffusion probabilistic model for automatic structure elucidation of organic compounds
By: Haochen Chen , Qi Huang , Anan Wu and more
Potential Business Impact:
Lets computers guess molecule shapes from data.
Automatic structure elucidation is essential for self-driving laboratories as it enables the system to achieve truly autonomous. This capability closes the experimental feedback loop, ensuring that machine learning models receive reliable structure information for real-time decision-making and optimization. Herein, we present DiSE, an end-to-end diffusion-based generative model that integrates multiple spectroscopic modalities, including MS, 13C and 1H chemical shifts, HSQC, and COSY, to achieve automated yet accurate structure elucidation of organic compounds. By learning inherent correlations among spectra through data-driven approaches, DiSE achieves superior accuracy, strong generalization across chemically diverse datasets, and robustness to experimental data despite being trained on calculated spectra. DiSE thus represents a significant advance toward fully automated structure elucidation, with broad potential in natural product research, drug discovery, and self-driving laboratories.
Similar Papers
DiffSpectra: Molecular Structure Elucidation from Spectra using Diffusion Models
Machine Learning (CS)
**Finds molecule shapes from sound and light.**
Atomic Diffusion Models for Small Molecule Structure Elucidation from NMR Spectra
Machine Learning (CS)
Finds drug structures automatically from simple tests.
SiDGen: Structure-informed Diffusion for Generative modeling of Ligands for Proteins
Machine Learning (CS)
Designs new medicines that fit into the body.