Accurate de novo sequencing of the modified proteome with OmniNovo
By: Yuhan Chen , Shang Qu , Zhiqiang Gao and more
Post-translational modifications (PTMs) serve as a dynamic chemical language regulating protein function, yet current proteomic methods remain blind to a vast portion of the modified proteome. Standard database search algorithms suffer from a combinatorial explosion of search spaces, limiting the identification of uncharacterized or complex modifications. Here we introduce OmniNovo, a unified deep learning framework for reference-free sequencing of unmodified and modified peptides directly from tandem mass spectra. Unlike existing tools restricted to specific modification types, OmniNovo learns universal fragmentation rules to decipher diverse PTMs within a single coherent model. By integrating a mass-constrained decoding algorithm with rigorous false discovery rate estimation, OmniNovo achieves state-of-the-art accuracy, identifying 51\% more peptides than standard approaches at a 1\% false discovery rate. Crucially, the model generalizes to biological sites unseen during training, illuminating the dark matter of the proteome and enabling unbiased comprehensive analysis of cellular regulation.
Similar Papers
Few-shot Protein Fitness Prediction via In-context Learning and Test-time Training
Biomolecules
Helps scientists design better proteins faster.
Path Signatures Enable Model-Free Mapping of RNA Modifications
Genomics
Finds hidden changes in RNA molecules.
UniPTMs: The First Unified Multi-type PTM Site Prediction Model via Master-Slave Architecture-Based Multi-Stage Fusion Strategy and Hierarchical Contrastive Loss
Machine Learning (CS)
Predicts how proteins change to control life.