CTBench: Cryptocurrency Time Series Generation Benchmark
By: Yihao Ang , Qiang Wang , Qiang Huang and more
Potential Business Impact:
Creates fake crypto money data for better trading.
Synthetic time series are essential tools for data augmentation, stress testing, and algorithmic prototyping in quantitative finance. However, in cryptocurrency markets, characterized by 24/7 trading, extreme volatility, and rapid regime shifts, existing Time Series Generation (TSG) methods and benchmarks often fall short, jeopardizing practical utility. Most prior work (1) targets non-financial or traditional financial domains, (2) focuses narrowly on classification and forecasting while neglecting crypto-specific complexities, and (3) lacks critical financial evaluations, particularly for trading applications. To address these gaps, we introduce \textsf{CTBench}, the first comprehensive TSG benchmark tailored for the cryptocurrency domain. \textsf{CTBench} curates an open-source dataset from 452 tokens and evaluates TSG models across 13 metrics spanning 5 key dimensions: forecasting accuracy, rank fidelity, trading performance, risk assessment, and computational efficiency. A key innovation is a dual-task evaluation framework: (1) the \emph{Predictive Utility} task measures how well synthetic data preserves temporal and cross-sectional patterns for forecasting, while (2) the \emph{Statistical Arbitrage} task assesses whether reconstructed series support mean-reverting signals for trading. We benchmark eight representative models from five methodological families over four distinct market regimes, uncovering trade-offs between statistical fidelity and real-world profitability. Notably, \textsf{CTBench} offers model ranking analysis and actionable guidance for selecting and deploying TSG models in crypto analytics and strategy development.
Similar Papers
SynTSBench: Rethinking Temporal Pattern Learning in Deep Learning Models for Time Series
Machine Learning (CS)
Tests computer predictions to find best ones.
TimeGraph: Synthetic Benchmark Datasets for Robust Time-Series Causal Discovery
Machine Learning (CS)
Creates realistic data to test how things cause other things.
Empowering Time Series Analysis with Synthetic Data: A Survey and Outlook in the Era of Foundation Models
Machine Learning (CS)
Creates fake data to train smart computer programs.