The Procrustean Bed of Time Series: The Optimization Bias of Point-wise Loss
By: Rongyao Cai , Yuxi Wan , Kexin Zhang and more
Optimizing time series models via point-wise loss functions (e.g., MSE) relying on a flawed point-wise independent and identically distributed (i.i.d.) assumption that disregards the causal temporal structure, an issue with growing awareness yet lacking formal theoretical grounding. Focusing on the core independence issue under covariance stationarity, this paper aims to provide a first-principles analysis of the Expectation of Optimization Bias (EOB), formalizing it information-theoretically as the discrepancy between the true joint distribution and its flawed i.i.d. counterpart. Our analysis reveals a fundamental paradigm paradox: the more deterministic and structured the time series, the more severe the bias by point-wise loss function. We derive the first closed-form quantification for the non-deterministic EOB across linear and non-linear systems, and prove EOB is an intrinsic data property, governed exclusively by sequence length and our proposed Structural Signal-to-Noise Ratio (SSNR). This theoretical diagnosis motivates our principled debiasing program that eliminates the bias through sequence length reduction and structural orthogonalization. We present a concrete solution that simultaneously achieves both principles via DFT or DWT. Furthermore, a novel harmonized $\ell_p$ norm framework is proposed to rectify gradient pathologies of high-variance series. Extensive experiments validate EOB Theory's generality and the superior performance of debiasing program.
Similar Papers
The Bias-Variance Tradeoff in Data-Driven Optimization: A Local Misspecification Perspective
Machine Learning (Stat)
Improves computer learning by balancing guessing and certainty.
Time-Varying Optimization for Streaming Data Via Temporal Weighting
Machine Learning (CS)
Learns from changing information to make better choices.
Beyond MSE: Ordinal Cross-Entropy for Probabilistic Time Series Forecasting
Machine Learning (CS)
Predicts future numbers more reliably, even with bad data.