Assessing Tenstorrent's RISC-V MatMul Acceleration Capabilities
By: Hiari Pizzini Cavagna, Daniele Cesarini, Andrea Bartolini
Potential Business Impact:
Makes AI run faster and use less power.
The increasing demand for generative AI as Large Language Models (LLMs) services has driven the need for specialized hardware architectures that optimize computational efficiency and energy consumption. This paper evaluates the performance of the Tenstorrent Grayskull e75 RISC-V accelerator for basic linear algebra kernels at reduced numerical precision, a fundamental operation in LLM computations. We present a detailed characterization of Grayskull's execution model, gridsize, matrix dimensions, data formats, and numerical precision impact computational efficiency. Furthermore, we compare Grayskull's performance against state-of-the-art architectures with tensor acceleration, including Intel Sapphire Rapids processors and two NVIDIA GPUs (V100 and A100). Whilst NVIDIA GPUs dominate raw performance, Grayskull demonstrates a competitive trade-off between power consumption and computational throughput, reaching a peak of 1.55 TFLOPs/Watt with BF16.
Similar Papers
Design and Implementation of an FPGA-Based Hardware Accelerator for Transformer
Hardware Architecture
Makes AI models run much faster and cheaper.
Efficient Kernel Mapping and Comprehensive System Evaluation of LLM Acceleration on a CGLA
Hardware Architecture
Makes AI run using much less electricity.
AI Accelerators for Large Language Model Inference: Architecture Analysis and Scaling Strategies
Hardware Architecture
Finds best computer chips for AI tasks.