How Fast Can Graph Computations Go on Fine-grained Parallel Architectures
By: Yuqing Wang , Charles Colley , Brian Wheatman and more
Potential Business Impact:
Makes computers solve graph puzzles much faster.
Large-scale graph problems are of critical and growing importance and historically parallel architectures have provided little support. In the spirit of co-design, we explore the question, How fast can graph computing go on a fine-grained architecture? We explore the possibilities of an architecture optimized for fine-grained parallelism, natural programming, and the irregularity and skew found in real-world graphs. Using two graph benchmarks, PageRank (PR) and Breadth-First Search (BFS), we evaluate a Fine-Grained Graph architecture, UpDown, to explore what performance codesign can achieve. To demonstrate programmability, we wrote five variants of these algorithms. Simulations of up to 256 nodes (524,288 lanes) and projections to 16,384 nodes (33M lanes) show the UpDown system can achieve 637K GTEPS PR and 989K GTEPS BFS on RMAT, exceeding the best prior results by 5x and 100x respectively.
Similar Papers
Performance-Driven Optimization of Parallel Breadth-First Search
Distributed, Parallel, and Cluster Computing
Makes computer searches on connected data faster.
Piccolo: Large-Scale Graph Processing with Fine-Grained In-Memory Scatter-Gather
Hardware Architecture
Makes computers process complex data much faster.
Beyond Exascale: Dataflow Domain Translation on a Cerebras Cluster
Distributed, Parallel, and Cluster Computing
Speeds up computer simulations for science and engineering.