Score: 0

Design and Implementation of an FPGA-Based Hardware Accelerator for Transformer

Published: March 20, 2025 | arXiv ID: 2503.16731v3

By: Richie Li, Sicheng Chen

Potential Business Impact:

Makes AI models run much faster and cheaper.

Business Areas:

Field-Programmable Gate Array (FPGA) Hardware

Transformer-based large language models (LLMs) rely heavily on intensive matrix multiplications for attention and feed-forward layers, with the Q, K, and V linear projections in the Multi-Head Self-Attention (MHA) module constituting a decisive performance bottleneck. In this work, we introduce a highly optimized tiled matrix multiplication accelerator on a resource-constrained Xilinx KV260 FPGA that not only addresses this challenge but sets a new standard for efficiency and performance. Our design exploits persistent on-chip storage, a robust two-level tiling strategy for maximal data reuse, and a systolic-like unrolled compute engine that together deliver unparalleled speed and energy efficiency. Integrated with DistilBERT for Q, K, and V projections, our accelerator achieves an unequivocal 7x speedup over ARM CPU implementations (PyTorch) and an extraordinary 200x improvement over naive NumPy, reaching a throughput of up to 3.1~GFLOPs for matrix multiplications on (64,768) x (768,3072) matrices while operating at a conservative 100 MHz. These results decisively demonstrate the transformative potential of FPGA-based acceleration for critical Transformer operations, paving the way for scalable and energy-efficient deep learning inference on edge devices.

MatrixFlow: System-Accelerator co-design for high-performance transformer applications

Hardware Architecture

Makes AI programs run much faster.

7 Mar 2025 1

89%

On-Device Qwen2.5: Efficient LLM Inference with Model Compression and Hardware Acceleration

Hardware Architecture

Makes smart AI run faster on small devices.

24 Apr 2025 1

88%

Low Power Vision Transformer Accelerator with Hardware-Aware Pruning and Optimized Dataflow

Hardware Architecture

Makes computer vision faster and use less power.

16 Oct 2025 0

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

7 pages

Design and Implementation of an FPGA-Based Hardware Accelerator for Transformer

Makes AI models run much faster and cheaper.

Technical Abstract

MatrixFlow: System-Accelerator co-design for high-performance transformer applications

On-Device Qwen2.5: Efficient LLM Inference with Model Compression and Hardware Acceleration

Low Power Vision Transformer Accelerator with Hardware-Aware Pruning and Optimized Dataflow