Performance-Driven Optimization of Parallel Breadth-First Search
By: Marati Bhaskar, Raghavendra Kanakagiri
Potential Business Impact:
Makes computer searches on connected data faster.
Breadth-first search (BFS) is a fundamental graph algorithm that presents significant challenges for parallel implementation due to irregular memory access patterns, load imbalance and synchronization overhead. In this paper, we introduce a set of optimization strategies for parallel BFS on multicore systems, including hybrid traversal, bitmap-based visited set, and a novel non-atomic distance update mechanism. We evaluate these optimizations across two different architectures - a 24-core Intel Xeon platform and a 128-core AMD EPYC system - using a diverse set of synthetic and real-world graphs. Our results demonstrate that the effectiveness of optimizations varies significantly based on graph characteristics and hardware architecture. For small-diameter graphs, our hybrid BFS implementation achieves speedups of 3-8x on the Intel platform and $3-10\times$ on the AMD system compared to a conventional parallel BFS implementation. However, the performance of large-diameter graphs is more nuanced, with some of the optimizations showing varied performance across platforms including performance degradation in some cases.
Similar Papers
How Fast Can Graph Computations Go on Fine-grained Parallel Architectures
Distributed, Parallel, and Cluster Computing
Makes computers solve graph puzzles much faster.
Parallel Complexity of Depth-First-Search and Maximal path
Data Structures and Algorithms
Helps computers map tricky networks faster.
Deterministic Self-Stabilizing BFS Construction in Constant Space
Distributed, Parallel, and Cluster Computing
Computers build a network map with tiny memory.