Data Quality Issues in Flare Prediction using Machine Learning Models
By: Ke Hu , Kevin Jin , Victor Verma and more
Potential Business Impact:
Fixes bad space data to predict solar flares better.
Machine learning models for forecasting solar flares have been trained and tested using a variety of data sources, such as Space Weather Prediction Center (SWPC) operational and science-quality data. Typically, data from these sources is minimally processed before being used to train and validate a forecasting model. However, predictive performance can be impaired if defects in and inconsistencies between these data sources are ignored. For a number of commonly used data sources, together with softwares that query and then output processed data, we identify their respective defects and inconsistencies, quantify their extent, and show how they can affect the predictions produced by data-driven machine learning forecasting models. We also outline procedures for fixing these issues or at least mitigating their impacts. Finally, based on our thorough comparisons of the impacts of data sources on the trained forecasting model in terms of predictive skill scores, we offer recommendations for the use of different data products in operational forecasting.
Similar Papers
How Data Quality Affects Machine Learning Models for Credit Risk Assessment
Machine Learning (CS)
Makes loan decisions more accurate even with bad data.
Solar Flare Forecast: A Comparative Analysis of Machine Learning Algorithms for Solar Flare Class Prediction
Solar and Stellar Astrophysics
Predicts sun flares to protect Earth's technology.
FLARE-SSM: Deep State Space Models with Influence-Balanced Loss for 72-Hour Solar Flare Prediction
CV and Pattern Recognition
Predicts big sun flares to protect Earth.