Score: 0

Engineering Compressed Matrix Multiplication with the Fast Walsh-Hadamard Transform

Published: January 14, 2026 | arXiv ID: 2601.09477v1

By: Joel Andersson, Matti Karppa

We present an implementation of Pagh's compressed matrix multiplication algorithm, a randomized algorithm that constructs sketches of matrices to compute an unbiased estimate of their product. By leveraging fast polynomial multiplication via the FFT, the algorithm achieves high performance when the product matrix is sparse or contains only a small number of entries with magnitudes significantly larger than the rest. We show empirically that the algorithm is practical and can outperform state-of-the-art DGEMM implementations when the product matrix has few nonzero entries or is otherwise dominated by a small subset of elements with large magnitude. As a minor theoretical contribution, we replace the FFT with the Fast Walsh-Hadamard Transform (FWHT) in sketched multiplication, preserving all correctness and variance guarantees of the original algorithm. Experiments with our carefully engineered multithreaded CPU implementation for dense double-precision matrices on 64-core CPU nodes across a range of synthetic benchmarks, exhibiting variable sparsity patterns, show that the FWHT variant is up to 4 times faster than the FFT-based version. Under favorable sparsity and magnitude patterns in the product matrix, our FWHT-based implementation achieves a speedup of up to 40 over DGEMM from Intel MKL, with low probability of error in the estimates. Our implementation is released as free software and comes with NumPy-compatible Python bindings.

(Approximate) Matrix Multiplication via Convolutions

Data Structures and Algorithms

Makes computers solve math problems much faster.

25 Oct 2025 0

85%

Mixed-Precision Performance Portability of FFT-Based GPU-Accelerated Algorithms for Block-Triangular Toeplitz Matrices

Distributed, Parallel, and Cluster Computing

Makes supercomputers run faster on different parts.

13 Aug 2025 2

85%

Fast Plaintext-Ciphertext Matrix Multiplication from Additively Homomorphic Encryption

Cryptography and Security

Makes private computer math much faster.

20 Apr 2025 1

View PDF Login to Bookmark

Engineering Compressed Matrix Multiplication with the Fast Walsh-Hadamard Transform

Technical Abstract

(Approximate) Matrix Multiplication via Convolutions

Mixed-Precision Performance Portability of FFT-Based GPU-Accelerated Algorithms for Block-Triangular Toeplitz Matrices

Fast Plaintext-Ciphertext Matrix Multiplication from Additively Homomorphic Encryption