Self-supervised pretraining of vision transformers for animal behavioral analysis and neural encoding
By: Yanchen Wang , Han Yu , Ari Blau and more
Potential Business Impact:
Helps scientists understand animal actions from videos.
The brain can only be fully understood through the lens of the behavior it generates -- a guiding principle in modern neuroscience research that nevertheless presents significant technical challenges. Many studies capture behavior with cameras, but video analysis approaches typically rely on specialized models requiring extensive labeled data. We address this limitation with BEAST (BEhavioral Analysis via Self-supervised pretraining of Transformers), a novel and scalable framework that pretrains experiment-specific vision transformers for diverse neuro-behavior analyses. BEAST combines masked autoencoding with temporal contrastive learning to effectively leverage unlabeled video data. Through comprehensive evaluation across multiple species, we demonstrate improved performance in three critical neuro-behavioral tasks: extracting behavioral features that correlate with neural activity, and pose estimation and action segmentation in both the single- and multi-animal settings. Our method establishes a powerful and versatile backbone model that accelerates behavioral analysis in scenarios where labeled data remains scarce.
Similar Papers
Self-supervised Learning on Camera Trap Footage Yields a Strong Universal Face Embedder
CV and Pattern Recognition
Identifies chimps from photos without human help.
A Model Zoo of Vision Transformers
Machine Learning (CS)
Creates many AI "brains" for better computer vision.
Masked Autoencoder Self Pre-Training for Defect Detection in Microelectronics
CV and Pattern Recognition
Finds tiny flaws in computer chips.