Score: 0

When Language Model Guides Vision: Grounding DINO for Cattle Muzzle Detection

Published: September 8, 2025 | arXiv ID: 2509.06427v1

By: Rabin Dulal, Lihong Zheng, Muhammad Ashad Kabir

Potential Business Impact:

Identifies cows by their nose patterns automatically.

Business Areas:
Image Recognition Data and Analytics, Software

Muzzle patterns are among the most effective biometric traits for cattle identification. Fast and accurate detection of the muzzle region as the region of interest is critical to automatic visual cattle identification.. Earlier approaches relied on manual detection, which is labor-intensive and inconsistent. Recently, automated methods using supervised models like YOLO have become popular for muzzle detection. Although effective, these methods require extensive annotated datasets and tend to be trained data-dependent, limiting their performance on new or unseen cattle. To address these limitations, this study proposes a zero-shot muzzle detection framework based on Grounding DINO, a vision-language model capable of detecting muzzles without any task-specific training or annotated data. This approach leverages natural language prompts to guide detection, enabling scalable and flexible muzzle localization across diverse breeds and environments. Our model achieves a mean Average Precision (mAP)@0.5 of 76.8\%, demonstrating promising performance without requiring annotated data. To our knowledge, this is the first research to provide a real-world, industry-oriented, and annotation-free solution for cattle muzzle detection. The framework offers a practical alternative to supervised methods, promising improved adaptability and ease of deployment in livestock monitoring applications.

Country of Origin
🇦🇺 Australia

Page Count
12 pages

Category
Computer Science:
CV and Pattern Recognition