Pushing the limits of one-dimensional NMR spectroscopy for automated structure elucidation using artificial intelligence
By: Frank Hu , Jonathan M. Tubb , Dimitris Argyropoulos and more
One-dimensional NMR spectroscopy is one of the most widely used techniques for the characterization of organic compounds and natural products. For molecules with up to 36 non-hydrogen atoms, the number of possible structures has been estimated to range from $10^{20} - 10^{60}$. The task of determining the structure (formula and connectivity) of a molecule of this size using only its one-dimensional $^1$H and/or $^{13}$C NMR spectrum, i.e. de novo structure generation, thus appears completely intractable. Here we show how it is possible to achieve this task for systems with up to 40 non-hydrogen atoms across the full elemental coverage typically encountered in organic chemistry (C, N, O, H, P, S, Si, B, and the halogens) using a deep learning framework, thus covering a vast portion of the drug-like chemical space. Leveraging insights from natural language processing, we show that our transformer-based architecture predicts the correct molecule with 55.2% accuracy within the first 15 predictions using only the $^1$H and $^{13}$C NMR spectra, thus overcoming the combinatorial growth of the chemical space while also being extensible to experimental data via fine-tuning.
Similar Papers
Atomic Diffusion Models for Small Molecule Structure Elucidation from NMR Spectra
Machine Learning (CS)
Finds drug structures automatically from simple tests.
DiffNMR: Diffusion Models for Nuclear Magnetic Resonance Spectra Elucidation
Chemical Physics
Helps scientists figure out what molecules look like.
Test-Time Tuned Language Models Enable End-to-end De Novo Molecular Structure Generation from MS/MS Spectra
Artificial Intelligence
Finds new chemicals from their broken pieces.