Noisy Label Refinement with Semantically Reliable Synthetic Images
By: Yingxuan Li, Jiafeng Mao, Yusuke Matsui
Potential Business Impact:
Fixes computer vision mistakes using fake pictures.
Semantic noise in image classification datasets, where visually similar categories are frequently mislabeled, poses a significant challenge to conventional supervised learning approaches. In this paper, we explore the potential of using synthetic images generated by advanced text-to-image models to address this issue. Although these high-quality synthetic images come with reliable labels, their direct application in training is limited by domain gaps and diversity constraints. Unlike conventional approaches, we propose a novel method that leverages synthetic images as reliable reference points to identify and correct mislabeled samples in noisy datasets. Extensive experiments across multiple benchmark datasets show that our approach significantly improves classification accuracy under various noise conditions, especially in challenging scenarios with semantic label noise. Additionally, since our method is orthogonal to existing noise-robust learning techniques, when combined with state-of-the-art noise-robust training methods, it achieves superior performance, improving accuracy by 30% on CIFAR-10 and by 11% on CIFAR-100 under 70% semantic noise, and by 24% on ImageNet-100 under real-world noise conditions.
Similar Papers
Anomaly Detection by Effectively Leveraging Synthetic Images
CV and Pattern Recognition
Makes factories find broken parts with fake pictures.
Fully-Synthetic Training for Visual Quality Inspection in Automotive Production
CV and Pattern Recognition
Creates fake pictures to train machines to find flaws.
A Framework for Low-Effort Training Data Generation for Urban Semantic Segmentation
CV and Pattern Recognition
Makes fake pictures look like real city photos.