Score: 1

SynDelay: A Synthetic Dataset for Delivery Delay Prediction

Published: August 30, 2025 | arXiv ID: 2509.05325v1

By: Liming Xu, Yunbo Long, Alexandra Brintrup

Potential Business Impact:

Helps predict delivery delays better with fake data.

Business Areas:
Predictive Analytics Artificial Intelligence, Data and Analytics, Software

Artificial intelligence (AI) is transforming supply chain management, yet progress in predictive tasks -- such as delivery delay prediction -- remains constrained by the scarcity of high-quality, openly available datasets. Existing datasets are often proprietary, small, or inconsistently maintained, hindering reproducibility and benchmarking. We present SynDelay, a synthetic dataset designed for delivery delay prediction. Generated using an advanced generative model trained on real-world data, SynDelay preserves realistic delivery patterns while ensuring privacy. Although not entirely free of noise or inconsistencies, it provides a challenging and practical testbed for advancing predictive modelling. To support adoption, we provide baseline results and evaluation metrics as initial benchmarks, serving as reference points rather than state-of-the-art claims. SynDelay is publicly available through the Supply Chain Data Hub, an open initiative promoting dataset sharing and benchmarking in supply chain AI. We encourage the community to contribute datasets, models, and evaluation practices to advance research in this area. All code is openly accessible at https://supplychaindatahub.org.

Country of Origin
🇬🇧 United Kingdom

Page Count
4 pages

Category
Computer Science:
Artificial Intelligence