AeroSketch: Near-Optimal Time Matrix Sketch Framework for Persistent, Sliding Window, and Distributed Streams
By: Hanyan Yin , Dongxie Wen , Jiajun Li and more
Potential Business Impact:
Makes big data updates much faster.
Many real-world matrix datasets arrive as high-throughput vector streams, making it impractical to store or process them in their entirety. To enable real-time analytics under limited computational, memory, and communication resources, matrix sketching techniques have been developed over recent decades to provide compact approximations of such streaming data. Some algorithms have achieved optimal space and communication complexity. However, these approaches often require frequent time-consuming matrix factorization operations. In particular, under tight approximation error bounds, each matrix factorization computation incurs cubic time complexity, thereby limiting their update efficiency. In this paper, we introduce AeroSketch, a novel matrix sketching framework that leverages recent advances in randomized numerical linear algebra (RandNLA). AeroSketch achieves optimal communication and space costs while delivering near-optimal update time complexity (within logarithmic factors) across persistent, sliding window, and distributed streaming scenarios. Extensive experiments on both synthetic and real-world datasets demonstrate that AeroSketch consistently outperforms state-of-the-art methods in update throughput. In particular, under tight approximation error constraints, AeroSketch reduces the cubic time complexity to the quadratic level. Meanwhile, it maintains comparable approximation quality while retaining optimal communication and space costs.
Similar Papers
Sublinear Sketches for Approximate Nearest Neighbor and Kernel Density Estimation
Machine Learning (CS)
Finds important patterns in fast-changing data.
Sketch Disaggregation Across Time and Space
Networking and Internet Architecture
Splits data summaries across many network devices.
Matrix Product Sketching via Coordinated Sampling
Data Structures and Algorithms
Makes computer math faster for big data.