Score: 0

ColliderML: The First Release of an OpenDataDetector High-Luminosity Physics Benchmark Dataset

Published: December 17, 2025 | arXiv ID: 2512.15230v1

By: Doğa Elitez , Paul Gessinger , Daniel Murnane and more

We introduce ColliderML - a large, open, experiment-agnostic dataset of fully simulated and digitised proton-proton collisions in High-Luminosity Large Hadron Collider conditions ($\sqrt{s}=14$ TeV, mean pile-up $μ= 200$). ColliderML provides one million events across ten Standard Model and Beyond Standard Model processes, plus extensive single-particle samples, all produced with modern next-to-leading order matrix element calculation and showering, realistic per-event pile-up overlay, a validated OpenDataDetector geometry, and standard reconstructions. The release fills a major gap for machine learning (ML) research on detector-level data, provided on the ML-friendly Hugging Face platform. We present physics coverage and the generation, simulation, digitisation and reconstruction pipeline, describe format and access, and initial collider physics benchmarks.

Category
Physics:
High Energy Physics - Experiment