Self-supervised Learning on Camera Trap Footage Yields a Strong Universal Face Embedder
By: Vladimir Iashin , Horace Lee , Dan Schofield and more
Potential Business Impact:
Identifies chimps from photos without human help.
Camera traps are revolutionising wildlife monitoring by capturing vast amounts of visual data; however, the manual identification of individual animals remains a significant bottleneck. This study introduces a fully self-supervised approach to learning robust chimpanzee face embeddings from unlabeled camera-trap footage. Leveraging the DINOv2 framework, we train Vision Transformers on automatically mined face crops, eliminating the need for identity labels. Our method demonstrates strong open-set re-identification performance, surpassing supervised baselines on challenging benchmarks such as Bossou, despite utilising no labelled data during training. This work underscores the potential of self-supervised learning in biodiversity monitoring and paves the way for scalable, non-invasive population studies.
Similar Papers
GorillaWatch: An Automated System for In-the-Wild Gorilla Re-Identification and Population Monitoring
CV and Pattern Recognition
Helps save gorillas by identifying them from videos.
Self-supervised pretraining of vision transformers for animal behavioral analysis and neural encoding
Neurons and Cognition
Helps scientists understand animal actions from videos.
Wildlife Target Re-Identification Using Self-supervised Learning in Non-Urban Settings
CV and Pattern Recognition
Helps identify animals from videos without labels.