Score: 1

LLM-Guided Agentic Object Detection for Open-World Understanding

Published: July 14, 2025 | arXiv ID: 2507.10844v1

By: Furkan Mumcu , Michael J. Jones , Anoop Cherian and more

Potential Business Impact:

Lets computers find and name new things.

Business Areas:

Autonomous Vehicles Transportation

Object detection traditionally relies on fixed category sets, requiring costly re-training to handle novel objects. While Open-World and Open-Vocabulary Object Detection (OWOD and OVOD) improve flexibility, OWOD lacks semantic labels for unknowns, and OVOD depends on user prompts, limiting autonomy. We propose an LLM-guided agentic object detection (LAOD) framework that enables fully label-free, zero-shot detection by prompting a Large Language Model (LLM) to generate scene-specific object names. These are passed to an open-vocabulary detector for localization, allowing the system to adapt its goals dynamically. We introduce two new metrics, Class-Agnostic Average Precision (CAAP) and Semantic Naming Average Precision (SNAP), to separately evaluate localization and naming. Experiments on LVIS, COCO, and COCO-OOD validate our approach, showing strong performance in detecting and naming novel objects. Our method offers enhanced autonomy and adaptability for open-world understanding.

OVOD-Agent: A Markov-Bandit Framework for Proactive Visual Reasoning and Self-Evolving Detection

Artificial Intelligence

Teaches computers to find any object, even new ones.

26 Nov 2025 0

89%

Harnessing Large Language and Vision-Language Models for Robust Out-of-Distribution Detection

CV and Pattern Recognition

Helps computers spot weird things in pictures.

9 Jan 2025 0

89%

When LLMs meet open-world graph learning: a new perspective for unlabeled data uncertainty

Machine Learning (CS)

Helps computers learn from messy, incomplete data.

20 May 2025 0

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Repos / Data Links

github.com

Page Count

10 pages

LLM-Guided Agentic Object Detection for Open-World Understanding

Lets computers find and name new things.

Technical Abstract

OVOD-Agent: A Markov-Bandit Framework for Proactive Visual Reasoning and Self-Evolving Detection

Harnessing Large Language and Vision-Language Models for Robust Out-of-Distribution Detection

When LLMs meet open-world graph learning: a new perspective for unlabeled data uncertainty