Prediction-Oriented Subsampling from Data Streams
By: Benedetta Lavinia Mussati , Freddie Bickford Smith , Tom Rainforth and more
Potential Business Impact:
Teaches computers to learn from fast-moving information.
Data is often generated in streams, with new observations arriving over time. A key challenge for learning models from data streams is capturing relevant information while keeping computational costs manageable. We explore intelligent data subsampling for offline learning, and argue for an information-theoretic method centred on reducing uncertainty in downstream predictions of interest. Empirically, we demonstrate that this prediction-oriented approach performs better than a previously proposed information-theoretic technique on two widely studied problems. At the same time, we highlight that reliably achieving strong performance in practice requires careful model design.
Similar Papers
Panprediction: Optimal Predictions for Any Downstream Task and Loss
Machine Learning (CS)
Teaches computers to solve many problems at once.
A subsampling approach for large data sets when the Generalised Linear Model is potentially misspecified
Methodology
Makes big data analysis faster and more accurate.
Uncertainty-Aware Subset Selection for Robust Visual Explainability under Distribution Shifts
CV and Pattern Recognition
Makes AI's image explanations more trustworthy everywhere.