One Small Step with Fingerprints, One Giant Leap for De Novo Molecule Generation from Mass Spectra
By: Neng Kai Nigel Neo , Lim Jing , Ngoui Yong Zhau Preston and more
Potential Business Impact:
**Finds new drug molecules from chemical fingerprints.**
A common approach to the de novo molecular generation problem from mass spectra involves a two-stage pipeline: (1) encoding mass spectra into molecular fingerprints, followed by (2) decoding these fingerprints into molecular structures. In our work, we adopt MIST as the encoder and MolForge as the decoder, leveraging additional training data to enhance performance. We also threshold the probabilities of each fingerprint bit to focus on the presence of substructures. This results in a tenfold improvement over previous state-of-the-art methods, generating top-1 28% / top-10 36% of molecular structures correctly from mass spectra in MassSpecGym. We position this as a strong baseline for future research in de novo molecule elucidation from mass spectra.
Similar Papers
One Small Step with Fingerprints, One Giant Leap for emph{De Novo} Molecule Generation from Mass Spectra
Machine Learning (CS)
Finds new molecules from chemical fingerprints.
Test-Time Tuned Language Models Enable End-to-end De Novo Molecular Structure Generation from MS/MS Spectra
Artificial Intelligence
Finds new chemicals from their broken pieces.
NovoMolGen: Rethinking Molecular Language Model Pretraining
Machine Learning (CS)
Creates new medicines faster and better.