Benchmarking Real-World Medical Image Classification with Noisy Labels: Challenges, Practice, and Outlook
By: Yuan Ma , Junlin Hou , Chao Zhang and more
Learning from noisy labels remains a major challenge in medical image analysis, where annotation demands expert knowledge and substantial inter-observer variability often leads to inconsistent or erroneous labels. Despite extensive research on learning with noisy labels (LNL), the robustness of existing methods in medical imaging has not been systematically assessed. To address this gap, we introduce LNMBench, a comprehensive benchmark for Label Noise in Medical imaging. LNMBench encompasses \textbf{10} representative methods evaluated across 7 datasets, 6 imaging modalities, and 3 noise patterns, establishing a unified and reproducible framework for robustness evaluation under realistic conditions. Comprehensive experiments reveal that the performance of existing LNL methods degrades substantially under high and real-world noise, highlighting the persistent challenges of class imbalance and domain variability in medical data. Motivated by these findings, we further propose a simple yet effective improvement to enhance model robustness under such conditions. The LNMBench codebase is publicly released to facilitate standardized evaluation, promote reproducible research, and provide practical insights for developing noise-resilient algorithms in both research and real-world medical applications.The codebase is publicly available on https://github.com/myyy777/LNMBench.
Similar Papers
FNBench: Benchmarking Robust Federated Learning against Noisy Labels
CV and Pattern Recognition
Fixes computer learning mistakes from bad data.
From Fuzzy Speech to Medical Insight: Benchmarking LLMs on Noisy Patient Narratives
Computation and Language
Helps doctors understand patient stories better.
Clinical Expert Uncertainty Guided Generalized Label Smoothing for Medical Noisy Label Learning
Machine Learning (CS)
Helps doctors make better health pictures.