Exploring Fast Fourier Transforms on the Tenstorrent Wormhole
By: Nick Brown, Jake Davies, Felix LeClair
Potential Business Impact:
Makes supercomputers use less power for math.
Whilst numerous areas of computing have adopted the RISC-V Instruction Set Architecture (ISA) wholesale in recent years, it is yet to become widespread in HPC. RISC-V accelerators offer a compelling option where the HPC community can benefit from the specialisation offered by the open nature of the standard but without the extensive ecosystem changes required when adopting RISC-V CPUs. In this paper we explore porting the Cooley-Tukey Fast Fourier Transform (FFT) algorithm to the Tenstorrent Wormhole PCIe RISC-V based accelerator. Built upon Tenstorrent's Tensix architecture, this technology decouples the movement of data from compute, potentially offering increased control to the programmer. Exploring different optimisation techniques to address the bottlenecks inherent in data movement, we demonstrate that for a 2D FFT whilst the Wormhole n300 is slower than a server-grade 24-core Xeon Platinum CPU, the Wormhole draws around 8 times less power and consumes around 2.8 times less energy than the CPU when computing the Fourier transform.
Similar Papers
Accelerating Gravitational $N$-Body Simulations Using the RISC-V-Based Tenstorrent Wormhole
Distributed, Parallel, and Cluster Computing
Speeds up space simulations and saves energy.
A Unified Hardware Accelerator for Fast Fourier Transform and Number Theoretic Transform
Cryptography and Security
Makes computers secure from future hacks.
Parallel FFTW on RISC-V: A Comparative Study including OpenMP, MPI, and HPX
Distributed, Parallel, and Cluster Computing
Makes many-core chips work faster for science.