Adaptive Label Error Detection: A Bayesian Approach to Mislabeled Data Detection
By: Zan Chaudhry , Noam H. Rotenberg , Brian Caffo and more
Machine learning classification systems are susceptible to poor performance when trained with incorrect ground truth labels, even when data is well-curated by expert annotators. As machine learning becomes more widespread, it is increasingly imperative to identify and correct mislabeling to develop more powerful models. In this work, we motivate and describe Adaptive Label Error Detection (ALED), a novel method of detecting mislabeling. ALED extracts an intermediate feature space from a deep convolutional neural network, denoises the features, models the reduced manifold of each class with a multidimensional Gaussian distribution, and performs a simple likelihood ratio test to identify mislabeled samples. We show that ALED has markedly increased sensitivity, without compromising precision, compared to established label error detection methods, on multiple medical imaging datasets. We demonstrate an example where fine-tuning a neural network on corrected data results in a 33.8% decrease in test set errors, providing strong benefits to end users. The ALED detector is deployed in the Python package statlab.
Similar Papers
Adaptive Label Correction for Robust Medical Image Segmentation with Noisy Labels
CV and Pattern Recognition
Fixes medical pictures with messy labels.
Adaptive Learning Guided by Bias-Noise-Alignment Diagnostics
Machine Learning (CS)
Helps computers learn better when things change.
LADLE-MM: Limited Annotation based Detector with Learned Ensembles for Multimodal Misinformation
CV and Pattern Recognition
Finds fake news using pictures and words.