Learning Accurate Models on Incomplete Data with Minimal Imputation
By: Cheng Zhen , Nischal Aryal , Arash Termehchy and more
Potential Business Impact:
Fixes messy data faster for smarter computers.
Missing data often exists in real-world datasets, requiring significant time and effort for imputation to learn accurate machine learning (ML) models. In this paper, we demonstrate that imputing all missing values is not always necessary to achieve an accurate ML model. We introduce the concept of minimal data imputation, which ensures accurate ML models trained over the imputed dataset. Implementing minimal imputation guarantees both minimal imputation effort and optimal ML models. We propose algorithms to find exact and approximate minimal imputation for various ML models. Our extensive experiments indicate that our proposed algorithms significantly reduce the time and effort required for data imputation.
Similar Papers
Missing Data in Signal Processing and Machine Learning: Models, Methods and Modern Approaches
Signal Processing
Fixes computer problems when information is missing.
To impute or not to impute: How machine learning modelers treat missing data
Machine Learning (CS)
Helps computers learn better by fixing missing info.
An Interdisciplinary and Cross-Task Review on Missing Data Imputation
Machine Learning (Stat)
Fixes broken data for better computer decisions.