TimeCluster with PCA is Equivalent to Subspace Identification of Linear Dynamical Systems
By: Christian L. Hines, Samuel Spillard, Daniel P. Martin
Potential Business Impact:
Finds patterns in long, changing data.
TimeCluster is a visual analytics technique for discovering structure in long multivariate time series by projecting overlapping windows of data into a low-dimensional space. We show that, when Principal Component Analysis (PCA) is chosen as the dimensionality reduction technique, this procedure is mathematically equivalent to classical linear subspace identification (block-Hankel matrix plus Singular Vector Decomposition (SVD)). In both approaches, the same low-dimensional linear subspace is extracted from the time series data. We first review the TimeCluster method and the theory of subspace system identification. Then we show that forming the sliding-window matrix of a time series yields a Hankel matrix, so applying PCA (via SVD) to this matrix recovers the same principal directions as subspace identification. Thus the cluster coordinates from TimeCluster coincide with the subspace identification methods. We present experiments on synthetic and real dynamical signals confirming that the two embeddings coincide. Finally, we explore and discuss future opportunities enabled by this equivalence, including forecasting from the identified state space, streaming/online extensions, incorporating and visualising external inputs and robust techniques for displaying underlying trends in corrupted data.
Similar Papers
An Approach to Variable Clustering: K-means in Transposed Data and its Relationship with Principal Component Analysis
Machine Learning (Stat)
Finds hidden patterns in data by grouping things.
Ultralow-dimensionality reduction for identifying critical transitions by spatial-temporal PCA
Machine Learning (Stat)
Finds hidden patterns to predict big changes.
Highly robust factored principal component analysis for matrix-valued outlier accommodation and explainable detection via matrix minimum covariance determinant
Methodology
Finds bad data points in complex pictures.