Leveraging Hardware-Aware Computation in Mixed-Precision Matrix Multiply: A Tile-Centric Approach
By: Qiao Zhang , Rabab Alomairy , Dali Wang and more
Potential Business Impact:
Makes computers solve problems faster and use less power.
General Matrix Multiplication (GEMM) is a critical operation underpinning a wide range of applications in high-performance computing (HPC) and artificial intelligence (AI). The emergence of hardware optimized for low-precision arithmetic necessitates a reevaluation of numerical algorithms to leverage mixed-precision computations, achieving improved performance and energy efficiency. This research introduces an adaptive mixed-precision GEMM framework that supports different precision formats at fine-grained tile/block levels. We utilize the PaRSEC runtime system to balance workloads across various architectures. The performance scales well on ARM CPU-based Fugaku supercomputer, Nvidia GPU-based A100 DGX, and AMD GPU-based Frontier supercomputer. This research aims to enhance computational efficiency and accuracy by bridging algorithmic advancements and hardware innovations, driving transformative progress in various applications.
Similar Papers
The Cambrian Explosion of Mixed-Precision Matrix Multiplication for Quantized Deep Learning Inference
Computation and Language
Makes computers do math faster for AI.
Scaling the memory wall using mixed-precision -- HPG-MxP on an exascale machine
Distributed, Parallel, and Cluster Computing
Makes supercomputers run science problems 1.6x faster.
Optimizing GEMM for Energy and Performance on Versal ACAP Architectures
Hardware Architecture
Makes computer math faster and use less power.