A Theoretical Analysis of Detecting Large Model-Generated Time Series
By: Junji Hou , Junzhou Zhao , Shuo Zhang and more
Potential Business Impact:
Finds fake computer-made number patterns.
Motivated by the increasing risks of data misuse and fabrication, we investigate the problem of identifying synthetic time series generated by Time-Series Large Models (TSLMs) in this work. While there are extensive researches on detecting model generated text, we find that these existing methods are not applicable to time series data due to the fundamental modality difference, as time series usually have lower information density and smoother probability distributions than text data, which limit the discriminative power of token-based detectors. To address this issue, we examine the subtle distributional differences between real and model-generated time series and propose the contraction hypothesis, which states that model-generated time series, unlike real ones, exhibit progressively decreasing uncertainty under recursive forecasting. We formally prove this hypothesis under theoretical assumptions on model behavior and time series structure. Model-generated time series exhibit progressively concentrated distributions under recursive forecasting, leading to uncertainty contraction. We provide empirical validation of the hypothesis across diverse datasets. Building on this insight, we introduce the Uncertainty Contraction Estimator (UCE), a white-box detector that aggregates uncertainty metrics over successive prefixes to identify TSLM-generated time series. Extensive experiments on 32 datasets show that UCE consistently outperforms state-of-the-art baselines, offering a reliable and generalizable solution for detecting model-generated time series.
Similar Papers
Keep the Lights On, Keep the Lengths in Check: Plug-In Adversarial Detection for Time-Series LLMs in Energy Forecasting
Cryptography and Security
Protects power grids from fake data.
Toward Reasoning-Centric Time-Series Analysis
Artificial Intelligence
Helps computers understand why things change.
Empowering Time Series Analysis with Synthetic Data: A Survey and Outlook in the Era of Foundation Models
Machine Learning (CS)
Creates fake data to train smart computer programs.