Score: 0

Subcellular proteome niche discovery using semi-supervised functional clustering

Published: December 8, 2025 | arXiv ID: 2512.08087v1

By: Ziyue Zheng , Loay J. Jabre , Matthew McIlvin and more

Intracellular compartmentalization of proteins underpins their function and the metabolic processes they sustain. Various mass spectrometry-based proteomics methods (subcellular spatial proteomics) now allow high throughput subcellular protein localization. Yet, the curation, analysis and interpretation of these data remain challenging, particularly in non-model organisms where establishing reliable marker proteins is difficult, and in contexts where experimental replication and subcellular fractionation are constrained. Here, we develop FSPmix, a semi-supervised functional clustering method implemented as an open-source R package, which leverages partial annotations from a subset of marker proteins to predict protein subcellular localization from subcellular spatial proteomics data. This method explicitly assumes that protein signatures vary smoothly across subcellular fractions, enabling more robust inference under low signal-to-noise data regimes. We applied FSPmix to a subcellular proteomics dataset from a marine diatom, allowing us to assign probabilistic localizations to proteins and uncover potentially new protein functions. Altogether, this work lays the foundation for more robust statistical analysis and interpretation of subcellular proteomics datasets, particularly in understudied organisms.

Category
Quantitative Biology:
Quantitative Methods