Conditional Cauchy-Schwarz Divergence for Time Series Analysis: Kernelized Estimation and Applications in Clustering and Fraud Detection
By: Jiayi Wang
Potential Business Impact:
Finds unusual patterns in data to spot fraud.
We study the conditional Cauchy-Schwarz divergence (C-CSD) as a symmetric and density-free measure for time series analysis. We derive a practical kernel based estimator using radial basis function kernels on both the condition and output spaces, together with numerical stabilizations including a symmetric logarithmic form with an epsilon ridge and a robust bandwidth selection rule based on the interquartile range. Median heuristic bandwidths are applied to window vectors, and effective rank filtering is used to avoid degenerate kernels. We demonstrate the framework in two applications. In time series clustering, conditioning on the time index and comparing scalar series values yields a pairwise C-CSD dissimilarity. Bandwidths are selected on the training split, after which precomputed distance k-medoids clustering is performed on the test split and evaluated using normalized mutual information. In fraud detection, conditioning on sliding transaction windows and comparing the magnitude of value changes with categorical and merchant change indicators, each query window is scored by contrasting a global normal reference mixture against a same account local history mixture with recency decay and change flag weighting. Account level decisions are obtained by aggregating window scores using the maximum value. Experiments on benchmark time series datasets and a transactional fraud detection dataset demonstrate stable estimation and effective performance under a strictly leak free evaluation protocol.
Similar Papers
Fast and Scalable Score-Based Kernel Calibration Tests
Machine Learning (Stat)
Checks if computer predictions are trustworthy.
Fundamental limits of distributed covariance matrix estimation via a conditional strong data processing inequality
Machine Learning (Stat)
Lets computers share data safely for better guessing.
Copula-Stein Discrepancy: A Generator-Based Stein Operator for Archimedean Dependence
Machine Learning (Stat)
Finds hidden patterns in how things are connected.