Disaggregated Design for GPU-Based Volumetric Data Structures
By: Massimiliano Meneghin, Ahmed H. Mahmoud
Potential Business Impact:
Makes computer simulations run 3 times faster.
Volumetric data structures typically prioritize data locality, focusing on efficient memory access patterns. This singular focus can neglect other critical performance factors, such as occupancy, communication, and kernel fusion. We introduce a novel \emph{disaggregated} design that rebalances trade-offs between locality and these objectives -- reducing communication overhead on distributed memory architectures, mitigating register pressure in complex boundary conditions, and enabling kernel fusion. We provide a thorough analysis of its benefits on a single-node multi-GPU Lattice Boltzmann Method (LBM) solver. Our evaluation spans dense, block-sparse, and multi-resolution discretizations, demonstrating our design's flexibility and efficiency. Leveraging this approach, we achieve up to a $3\times$ speedup over state-of-the-art solutions.
Similar Papers
Resolution Where It Counts: Hash-based GPU-Accelerated 3D Reconstruction via Variance-Adaptive Voxel Grids
Graphics
Creates detailed 3D shapes faster and with less memory.
Survey of Disaggregated Memory: Cross-layer Technique Insights for Next-Generation Datacenters
Distributed, Parallel, and Cluster Computing
Lets computers share memory to work faster.
Contiguous Storage of Grid Data for Heterogeneous Computing
Computational Engineering, Finance, and Science
Makes computer simulations run faster on new chips.