Score: 1

Fine-Grained Open-Vocabulary Object Detection with Fined-Grained Prompts: Task, Dataset and Benchmark

Published: March 19, 2025 | arXiv ID: 2503.14862v2

By: Ying Liu , Yijing Hua , Haojiang Chai and more

Potential Business Impact:

Helps computers see and name new things.

Business Areas:

Image Recognition Data and Analytics, Software

Open-vocabulary detectors are proposed to locate and recognize objects in novel classes. However, variations in vision-aware language vocabulary data used for open-vocabulary learning can lead to unfair and unreliable evaluations. Recent evaluation methods have attempted to address this issue by incorporating object properties or adding locations and characteristics to the captions. Nevertheless, since these properties and locations depend on the specific details of the images instead of classes, detectors can not make accurate predictions without precise descriptions provided through human annotation. This paper introduces 3F-OVD, a novel task that extends supervised fine-grained object detection to the open-vocabulary setting. Our task is intuitive and challenging, requiring a deep understanding of Fine-grained captions and careful attention to Fine-grained details in images in order to accurately detect Fine-grained objects. Additionally, due to the scarcity of qualified fine-grained object detection datasets, we have created a new dataset, NEU-171K, tailored for both supervised and open-vocabulary settings. We benchmark state-of-the-art object detectors on our dataset for both settings. Furthermore, we propose a simple yet effective post-processing technique.

A Hierarchical Semantic Distillation Framework for Open-Vocabulary Object Detection

CV and Pattern Recognition

Teaches computers to find any object, even new ones.

13 Mar 2025 1

89%

ODOV: Towards Open-Domain Open-Vocabulary Object Detection

CV and Pattern Recognition

Helps computers recognize any object anywhere.

2 Aug 2025 0

89%

OpenM3D: Open Vocabulary Multi-view Indoor 3D Object Detection without Human Annotations

CV and Pattern Recognition

Finds objects in 3D rooms without human labels.

27 Aug 2025 0

View PDF Login to Bookmark

Repos / Data Links

github.com

Page Count

8 pages

Fine-Grained Open-Vocabulary Object Detection with Fined-Grained Prompts: Task, Dataset and Benchmark

Helps computers see and name new things.

Technical Abstract

A Hierarchical Semantic Distillation Framework for Open-Vocabulary Object Detection

ODOV: Towards Open-Domain Open-Vocabulary Object Detection

OpenM3D: Open Vocabulary Multi-view Indoor 3D Object Detection without Human Annotations