To impute or not to impute: How machine learning modelers treat missing data
By: Wanyi Chen, Mary Cummings
Potential Business Impact:
Helps computers learn better by fixing missing info.
Missing data is prevalent in tabular machine learning (ML) models, and different missing data treatment methods can significantly affect ML model training results. However, little is known about how ML researchers and engineers choose missing data treatment methods and what factors affect their choices. To this end, we conducted a survey of 70 ML researchers and engineers. Our results revealed that most participants were not making informed decisions regarding missing data treatment, which could significantly affect the validity of the ML models trained by these researchers. We advocate for better education on missing data, more standardized missing data reporting, and better missing data analysis tools.
Similar Papers
Learning Accurate Models on Incomplete Data with Minimal Imputation
Machine Learning (CS)
Fixes messy data faster for smarter computers.
Missing Data in Signal Processing and Machine Learning: Models, Methods and Modern Approaches
Signal Processing
Fixes computer problems when information is missing.
An Interdisciplinary and Cross-Task Review on Missing Data Imputation
Machine Learning (Stat)
Fixes broken data for better computer decisions.