From Label Error Detection to Correction: A Modular Framework and Benchmark for Object Detection Datasets
By: Sarina Penquitt , Jonathan Klees , Rinor Cakaj and more
Potential Business Impact:
Fixes mistakes in computer vision training data.
Object detection has advanced rapidly in recent years, driven by increasingly large and diverse datasets. However, label errors, defined as missing labels, incorrect classification or inaccurate localization, often compromise the quality of these datasets. This can have a significant impact on the outcomes of training and benchmark evaluations. Although several methods now exist for detecting label errors in object detection datasets, they are typically validated only on synthetic benchmarks or limited manual inspection. How to correct such errors systemically and at scale therefore remains an open problem. We introduce a semi-automated framework for label-error correction called REC$\checkmark$D (Rechecked). Building on existing detectors, the framework pairs their error proposals with lightweight, crowd-sourced microtasks. These tasks enable multiple annotators to independently verify each candidate bounding box, and their responses are aggregated to estimate ambiguity and improve label quality. To demonstrate the effectiveness of REC$\checkmark$D, we apply it to the class pedestrian in the KITTI dataset. Our crowdsourced review yields high-quality corrected annotations, which indicate a rate of at least 24% of missing and inaccurate annotations in original annotations. This validated set will be released as a new real-world benchmark for label error detection and correction. We show that current label error detection methods, when combined with our correction framework, can recover hundreds of errors in the time it would take a human to annotate bounding boxes from scratch. However, even the best methods still miss up to 66% of the true errors and with low quality labels introduce more errors than they find. This highlights the urgent need for further research, now enabled by our released benchmark.
Similar Papers
Learning to Detect Label Errors by Making Them: A Method for Segmentation and Object Detection Datasets
Machine Learning (CS)
Finds mistakes in AI image training labels
Pseudo-Labeling Driven Refinement of Benchmark Object Detection Datasets via Analysis of Learning Patterns
CV and Pattern Recognition
Fixes computer vision mistakes for better object finding.
Continual Error Correction on Low-Resource Devices
CV and Pattern Recognition
Fixes AI mistakes with just a few examples.