DTC: Real-Time and Accurate Distributed Triangle Counting in Fully Dynamic Graph Streams
By: Wei Xuan , Yan Liang , Huawei Cao and more
Potential Business Impact:
Counts triangles in changing online networks.
Triangle counting is a fundamental problem in graph mining, essential for analyzing graph streams with arbitrary edge orders. However, exact counting becomes impractical due to the massive size of real-world graph streams. To address this, approximate algorithms have been developed, but existing distributed streaming algorithms lack adaptability and struggle with edge deletions. In this article, we propose DTC, a novel family of single-pass distributed streaming algorithms for global and local triangle counting in fully dynamic graph streams. Our DTC-AR algorithm accurately estimates triangle counts without prior knowledge of graph size, leveraging multi-machine resources. Additionally, we introduce DTC-FD, an algorithm tailored for fully dynamic graph streams, incorporating edge insertions and deletions. Using Random Pairing and future edge insertion compensation, DTC-FD achieves unbiased and accurate approximations across multiple machines. Experimental results demonstrate significant improvements over baselines. DTC-AR achieves up to $2029.4\times$ and $27.1\times$ more accuracy, while maintaining the best trade-off between accuracy and storage space. DTC-FD reduces estimation errors by up to $32.5\times$ and $19.3\times$, scaling linearly with graph stream size. These findings highlight the effectiveness of our proposed algorithms in tackling triangle counting in real-world scenarios. The source code and datasets are released and available at \href{https://github.com/wayne4s/srds-dtc.git}{https://github.com/wayne4s/srds-dtc.git}.
Similar Papers
Triangle Counting in Hypergraph Streams: A Complete and Practical Approach
Data Structures and Algorithms
Finds hidden patterns in complex networks.
Efficient Approximate Temporal Triangle Counting in Streaming with Predictions
Data Structures and Algorithms
Counts connections in fast-changing networks.
Accelerating Triangle Counting with Real Processing-in-Memory Systems
Hardware Architecture
Finds patterns in networks much faster.