Score: 0

Prediction-Oriented Subsampling from Data Streams

Published: August 5, 2025 | arXiv ID: 2508.03868v1

By: Benedetta Lavinia Mussati , Freddie Bickford Smith , Tom Rainforth and more

Potential Business Impact:

Teaches computers to learn from fast-moving information.

Data is often generated in streams, with new observations arriving over time. A key challenge for learning models from data streams is capturing relevant information while keeping computational costs manageable. We explore intelligent data subsampling for offline learning, and argue for an information-theoretic method centred on reducing uncertainty in downstream predictions of interest. Empirically, we demonstrate that this prediction-oriented approach performs better than a previously proposed information-theoretic technique on two widely studied problems. At the same time, we highlight that reliably achieving strong performance in practice requires careful model design.