Score: 2

From Label Error Detection to Correction: A Modular Framework and Benchmark for Object Detection Datasets

Published: August 6, 2025 | arXiv ID: 2508.06556v1

By: Sarina Penquitt , Jonathan Klees , Rinor Cakaj and more

Potential Business Impact:

Fixes mistakes in computer vision training data.

Object detection has advanced rapidly in recent years, driven by increasingly large and diverse datasets. However, label errors, defined as missing labels, incorrect classification or inaccurate localization, often compromise the quality of these datasets. This can have a significant impact on the outcomes of training and benchmark evaluations. Although several methods now exist for detecting label errors in object detection datasets, they are typically validated only on synthetic benchmarks or limited manual inspection. How to correct such errors systemically and at scale therefore remains an open problem. We introduce a semi-automated framework for label-error correction called REC$\checkmark$D (Rechecked). Building on existing detectors, the framework pairs their error proposals with lightweight, crowd-sourced microtasks. These tasks enable multiple annotators to independently verify each candidate bounding box, and their responses are aggregated to estimate ambiguity and improve label quality. To demonstrate the effectiveness of REC$\checkmark$D, we apply it to the class pedestrian in the KITTI dataset. Our crowdsourced review yields high-quality corrected annotations, which indicate a rate of at least 24% of missing and inaccurate annotations in original annotations. This validated set will be released as a new real-world benchmark for label error detection and correction. We show that current label error detection methods, when combined with our correction framework, can recover hundreds of errors in the time it would take a human to annotate bounding boxes from scratch. However, even the best methods still miss up to 66% of the true errors and with low quality labels introduce more errors than they find. This highlights the urgent need for further research, now enabled by our released benchmark.

Learning to Detect Label Errors by Making Them: A Method for Segmentation and Object Detection Datasets

Machine Learning (CS)

Finds mistakes in AI image training labels

25 Aug 2025 1

88%

Pseudo-Labeling Driven Refinement of Benchmark Object Detection Datasets via Analysis of Learning Patterns

CV and Pattern Recognition

Fixes computer vision mistakes for better object finding.

1 Jun 2025 1

87%

Continual Error Correction on Low-Resource Devices

CV and Pattern Recognition

Fixes AI mistakes with just a few examples.

26 Nov 2025 1

View PDF Login to Bookmark

Country of Origin

🇩🇪 Germany

Repos / Data Links

github.com

Page Count

21 pages

From Label Error Detection to Correction: A Modular Framework and Benchmark for Object Detection Datasets

Fixes mistakes in computer vision training data.

Technical Abstract

Learning to Detect Label Errors by Making Them: A Method for Segmentation and Object Detection Datasets

Pseudo-Labeling Driven Refinement of Benchmark Object Detection Datasets via Analysis of Learning Patterns

Continual Error Correction on Low-Resource Devices