Score: 0

SELECT: Detecting Label Errors in Real-world Scene Text Data

Published: December 16, 2025 | arXiv ID: 2512.14050v1

By: Wenjun Liu , Qian Wu , Yifeng Hu and more

Potential Business Impact:

Finds mistakes in text pictures for better reading.

Business Areas:

Image Recognition Data and Analytics, Software

We introduce SELECT (Scene tExt Label Errors deteCTion), a novel approach that leverages multi-modal training to detect label errors in real-world scene text datasets. Utilizing an image-text encoder and a character-level tokenizer, SELECT addresses the issues of variable-length sequence labels, label sequence misalignment, and character-level errors, outperforming existing methods in accuracy and practical utility. In addition, we introduce Similarity-based Sequence Label Corruption (SSLC), a process that intentionally introduces errors into the training labels to mimic real-world error scenarios during training. SSLC not only can cause a change in the sequence length but also takes into account the visual similarity between characters during corruption. Our method is the first to detect label errors in real-world scene text datasets successfully accounting for variable-length labels. Experimental results demonstrate the effectiveness of SELECT in detecting label errors and improving STR accuracy on real-world text datasets, showcasing its practical utility.