Text Anomaly Detection with Simplified Isolation Kernel
By: Yang Cao , Sikun Yang , Yujiu Yang and more
Potential Business Impact:
Finds weird text faster with less computer power.
Two-step approaches combining pre-trained large language model embeddings and anomaly detectors demonstrate strong performance in text anomaly detection by leveraging rich semantic representations. However, high-dimensional dense embeddings extracted by large language models pose challenges due to substantial memory requirements and high computation time. To address this challenge, we introduce the Simplified Isolation Kernel (SIK), which maps high-dimensional dense embeddings to lower-dimensional sparse representations while preserving crucial anomaly characteristics. SIK has linear time complexity and significantly reduces space complexity through its innovative boundary-focused feature mapping. Experiments across 7 datasets demonstrate that SIK achieves better detection performance than 11 state-of-the-art (SOTA) anomaly detection algorithms while maintaining computational efficiency and low memory cost. All code and demonstrations are available at https://github.com/charles-cao/SIK.
Similar Papers
Isolation-based Spherical Ensemble Representations for Anomaly Detection
Machine Learning (CS)
Finds weird patterns in data faster and better.
Sparse, self-organizing ensembles of local kernels detect rare statistical anomalies
Machine Learning (CS)
Finds hidden problems in complex data.
Kernel Embeddings and the Separation of Measure Phenomenon
Machine Learning (Stat)
Finds differences between data perfectly.