Engineering Compressed Matrix Multiplication with the Fast Walsh-Hadamard Transform
By: Joel Andersson, Matti Karppa
We present an implementation of Pagh's compressed matrix multiplication algorithm, a randomized algorithm that constructs sketches of matrices to compute an unbiased estimate of their product. By leveraging fast polynomial multiplication via the FFT, the algorithm achieves high performance when the product matrix is sparse or contains only a small number of entries with magnitudes significantly larger than the rest. We show empirically that the algorithm is practical and can outperform state-of-the-art DGEMM implementations when the product matrix has few nonzero entries or is otherwise dominated by a small subset of elements with large magnitude. As a minor theoretical contribution, we replace the FFT with the Fast Walsh-Hadamard Transform (FWHT) in sketched multiplication, preserving all correctness and variance guarantees of the original algorithm. Experiments with our carefully engineered multithreaded CPU implementation for dense double-precision matrices on 64-core CPU nodes across a range of synthetic benchmarks, exhibiting variable sparsity patterns, show that the FWHT variant is up to 4 times faster than the FFT-based version. Under favorable sparsity and magnitude patterns in the product matrix, our FWHT-based implementation achieves a speedup of up to 40 over DGEMM from Intel MKL, with low probability of error in the estimates. Our implementation is released as free software and comes with NumPy-compatible Python bindings.
Similar Papers
(Approximate) Matrix Multiplication via Convolutions
Data Structures and Algorithms
Makes computers solve math problems much faster.
Mixed-Precision Performance Portability of FFT-Based GPU-Accelerated Algorithms for Block-Triangular Toeplitz Matrices
Distributed, Parallel, and Cluster Computing
Makes supercomputers run faster on different parts.
Fast Plaintext-Ciphertext Matrix Multiplication from Additively Homomorphic Encryption
Cryptography and Security
Makes private computer math much faster.