Towards a Higher Roofline for Matrix-Vector Multiplication in Matrix-Free HOSFEM
By: Zijian Cao , Qiao Sun , Tiangong Zhang and more
Potential Business Impact:
Makes computer simulations run much faster.
The high-order/spectral finite element method (HOSFEM) is a widely used numerical method for solving PDEs, with its performance primarily relying on axhelm, a matrix-free kernel for element-local matrix-vector multiplications. In axhelm, geometric factors account for over half of memory access but minimally contribute to computational workload. This imbalance significantly constrains the performance roofline, indicating that further optimization of tensor contraction, the core computation in axhelm, yields only minimal improvements. To overcome this bottleneck, we propose a low-cost on-the-fly recalculation of geometric factors for trilinear elements, thereby unlocking substantial potential for optimizing tensor contraction. The proposed approach is implemented in Nekbone, a standard HOSFEM benchmark. With optimizations such as merging scalar factors, partial recalculation, Tensor Core acceleration, and constant memory utilization, performance reaches 85%-100% of the higher roofline. The optimized kernels achieve speedups of 1.74x-4.10x on NVIDIA A100 and 1.99x-3.77x on DCU K100. This leads to a 1.12x-1.40x speedup for Nekbone.
Similar Papers
Towards a Higher Roofline for Matrix-Vector Multiplication in Matrix-Free HOSFEM
Performance
Computers solve math problems faster by recalculating.
Learning-Augmented Performance Model for Tensor Product Factorization in High-Order FEM
Distributed, Parallel, and Cluster Computing
Helps supercomputers run math problems faster.
A Hybrid High-Order Finite Element Method for a Nonlocal Nonlinear Problem of Kirchhoff Type
Numerical Analysis
Solves hard math problems for engineering and science.