Decoding Dynamic Visual Experience from Calcium Imaging via Cell-Pattern-Aware SSL
By: Sangyoon Bae , Mehdi Azabou , Jiook Cha and more
Potential Business Impact:
Helps computers understand brain signals better.
Self-supervised learning (SSL) holds a great deal of promise for applications in neuroscience, due to the lack of large-scale, consistently labeled neural datasets. However, most neural datasets contain heterogeneous populations that mix stable, predictable cells with highly stochastic, stimulus-contingent ones, which has made it hard to identify consistent activity patterns during SSL. As a result, self-supervised pretraining has yet to show clear signs of benefits from scale on neural data. Here, we present a novel approach to self-supervised pretraining, POYO-SSL that exploits the heterogeneity of neural data to improve pre-training and achieve benefits of scale. Specifically, in POYO-SSL we pretrain only on predictable (statistically regular) neurons-identified on the pretraining split via simple higher-order statistics (skewness and kurtosis)-then we fine-tune on the unpredictable population for downstream tasks. On the Allen Brain Observatory dataset, this strategy yields approximately 12-13% relative gains over from-scratch training and exhibits smooth, monotonic scaling with model size. In contrast, existing state-of-the-art baselines plateau or destabilize as model size increases. By making predictability an explicit metric for crafting the data diet, POYO-SSL turns heterogeneity from a liability into an asset, providing a robust, biologically grounded recipe for scalable neural decoding and a path toward foundation models of neural dynamics.
Similar Papers
Self-Supervised YOLO: Leveraging Contrastive Learning for Label-Efficient Object Detection
CV and Pattern Recognition
Trains computers to spot objects without labeled pictures.
Self-supervised structured object representation learning
CV and Pattern Recognition
Helps computers see objects in pictures better.
Self-Supervised Dynamical System Representations for Physiological Time-Series
Machine Learning (CS)
Helps computers understand body signals better.