Time Series Foundation Models: Benchmarking Challenges and Requirements
By: Marcel Meyer , Sascha Kaltenpoth , Kevin Zalipski and more
Potential Business Impact:
Tests if future predictions are truly new.
Time Series Foundation Models (TSFMs) represent a new paradigm for time series forecasting, offering zero-shot forecasting capabilities without the need for domain-specific pre-training or fine-tuning. However, as with Large Language Models (LLMs), evaluating TSFMs is tricky, as with ever more extensive training sets, it becomes more and more challenging to ensure the integrity of benchmarking data. Our investigation of existing TSFM evaluation highlights multiple challenges, ranging from the representativeness of the benchmark datasets, over the lack of spatiotemporal evaluation, to risks of information leakage due to overlapping and obscure datasets, and the memorization of global patterns caused by external shocks like economic crises or pandemics. Our findings reveal widespread confusion regarding data partitions, risking inflated performance estimates and incorrect transfer of global knowledge to local time series. We argue for the development of robust evaluation methodologies to prevent pitfalls already observed in LLM and classical time series benchmarking, and call upon the research community to design new, principled approaches, such as evaluations on truly out-of-sample future data, to safeguard the integrity of TSFM assessment.
Similar Papers
Re(Visiting) Time Series Foundation Models in Finance
Computational Finance
Teaches computers to predict stock prices better.
Foundation Model Forecasts: Form and Function
Machine Learning (CS)
Makes weather forecasts useful for planning trips.
StarEmbed: Benchmarking Time Series Foundation Models on Astronomical Observations of Variable Stars
Solar and Stellar Astrophysics
Helps computers understand star light patterns.