SynDelay: A Synthetic Dataset for Delivery Delay Prediction
By: Liming Xu, Yunbo Long, Alexandra Brintrup
Potential Business Impact:
Helps predict delivery delays better with fake data.
Artificial intelligence (AI) is transforming supply chain management, yet progress in predictive tasks -- such as delivery delay prediction -- remains constrained by the scarcity of high-quality, openly available datasets. Existing datasets are often proprietary, small, or inconsistently maintained, hindering reproducibility and benchmarking. We present SynDelay, a synthetic dataset designed for delivery delay prediction. Generated using an advanced generative model trained on real-world data, SynDelay preserves realistic delivery patterns while ensuring privacy. Although not entirely free of noise or inconsistencies, it provides a challenging and practical testbed for advancing predictive modelling. To support adoption, we provide baseline results and evaluation metrics as initial benchmarks, serving as reference points rather than state-of-the-art claims. SynDelay is publicly available through the Supply Chain Data Hub, an open initiative promoting dataset sharing and benchmarking in supply chain AI. We encourage the community to contribute datasets, models, and evaluation practices to advance research in this area. All code is openly accessible at https://supplychaindatahub.org.
Similar Papers
Generation of synthetic delay time series for air transport applications
Machine Learning (CS)
Creates fake flight delays that help predict real ones.
Pre-Tactical Flight-Delay and Turnaround Forecasting with Synthetic Aviation Data
Machine Learning (CS)
Makes flight predictions possible without secret data.
RealDriveSim: A Realistic Multi-Modal Multi-Task Synthetic Dataset for Autonomous Driving
CV and Pattern Recognition
Creates realistic fake driving scenes for self-driving cars.