Cattle-CLIP: A Multimodal Framework for Cattle Behaviour Recognition
By: Huimin Liu , Jing Gao , Daria Baran and more
Potential Business Impact:
Helps farmers watch cows' health with cameras.
Cattle behaviour is a crucial indicator of an individual animal health, productivity and overall well-being. Video-based monitoring, combined with deep learning techniques, has become a mainstream approach in animal biometrics, and it can offer high accuracy in some behaviour recognition tasks. We present Cattle-CLIP, a multimodal deep learning framework for cattle behaviour recognition, using semantic cues to improve the performance of video-based visual feature recognition. It is adapted from the large-scale image-language model CLIP by adding a temporal integration module. To address the domain gap between web data used for the pre-trained model and real-world cattle surveillance footage, we introduce tailored data augmentation strategies and specialised text prompts. Cattle-CLIP is evaluated under both fully-supervised and few-shot learning scenarios, with a particular focus on data-scarce behaviour recognition - an important yet under-explored goal in livestock monitoring. To evaluate the proposed method, we release the CattleBehaviours6 dataset, which comprises six types of indoor behaviours: feeding, drinking, standing-self-grooming, standing-ruminating, lying-self-grooming and lying-ruminating. The dataset consists of 1905 clips collected from our John Oldacre Centre dairy farm research platform housing 200 Holstein-Friesian cows. Experiments show that Cattle-CLIP achieves 96.1% overall accuracy across six behaviours in a supervised setting, with nearly 100% recall for feeding, drinking and standing-ruminating behaviours, and demonstrates robust generalisation with limited data in few-shot scenarios, highlighting the potential of multimodal learning in agricultural and animal behaviour analysis.
Similar Papers
Automatic Retrieval of Specific Cows from Unlabeled Videos
CV and Pattern Recognition
Identifies cows automatically from videos without deep learning.
CCoMAML: Efficient Cattle Identification Using Cooperative Model-Agnostic Meta-Learning
CV and Pattern Recognition
Identifies cows by their nose prints, even with little data.
AnimalMotionCLIP: Embedding motion in CLIP for Animal Behavior Analysis
CV and Pattern Recognition
Helps computers understand animal movements and actions.