Score: 1

Data Quality Issues in Flare Prediction using Machine Learning Models

Published: December 15, 2025 | arXiv ID: 2512.13417v1

By: Ke Hu , Kevin Jin , Victor Verma and more

Potential Business Impact:

Fixes bad space data to predict solar flares better.

Business Areas:
Predictive Analytics Artificial Intelligence, Data and Analytics, Software

Machine learning models for forecasting solar flares have been trained and tested using a variety of data sources, such as Space Weather Prediction Center (SWPC) operational and science-quality data. Typically, data from these sources is minimally processed before being used to train and validate a forecasting model. However, predictive performance can be impaired if defects in and inconsistencies between these data sources are ignored. For a number of commonly used data sources, together with softwares that query and then output processed data, we identify their respective defects and inconsistencies, quantify their extent, and show how they can affect the predictions produced by data-driven machine learning forecasting models. We also outline procedures for fixing these issues or at least mitigating their impacts. Finally, based on our thorough comparisons of the impacts of data sources on the trained forecasting model in terms of predictive skill scores, we offer recommendations for the use of different data products in operational forecasting.

Country of Origin
🇺🇸 United States

Page Count
34 pages

Category
Astrophysics:
Solar and Stellar Astrophysics