HFRWKV: A High-Performance Fully On-Chip Hardware Accelerator for RWKV
By: Liu Shijie , Zeng Zhenghao , Jiao Han and more
Potential Business Impact:
Makes AI learn faster and use less power.
RWKV is a modern RNN architecture that approaches the performance of Transformers, with the advantage of processing long contexts at a linear memory cost. However, its sequential computation pattern struggles to efficiently leverage GPU parallelism, which leads to low compute resource utilization. Furthermore, frequent off-chip weight accesses create a memory bottleneck. To address these challenges, we propose HFRWKV, an FPGA-based hardware accelerator specifically designed for RWKV. Within the matrix operation module, we propose a novel hardware-friendly hybrid-precision quantization strategy, which enhances performance while maintaining acceptable accuracy. For the complex operations including exponentiation and division, we introduce a method featuring reusable architectures combined with lookup tables or piecewise linear approximation, which is algorithmically refined to effectively balance precision and hardware resource consumption. Based on this foundation, we adopt a fully on-chip computing system integrating parallel matrix-vector processing array and an efficient pipeline architecture. Through computation reordering and chunked double buffering, it effectively eliminates data transfer bottlenecks and improves overall throughput. We implement HFRWKV on the Alveo U50 and U280 platform. Experimental results show that compared to a CPU, a throughput improvement of 63.48$\times$ and an energy efficiency improvement of 139.17$\times$. Compared to GPUs, achieves a throughput improvement of 32.33$\times$ and an energy efficiency improvement of 171.36$\times$.
Similar Papers
AudioRWKV: Efficient and Stable Bidirectional RWKV for Audio Pattern Recognition
Sound
Makes computers understand long sounds faster.
Design and Implementation of an FPGA-Based Hardware Accelerator for Transformer
Hardware Architecture
Makes AI models run much faster and cheaper.
RWKV-X: A Linear Complexity Hybrid Language Model
Computation and Language
Lets computers understand very long stories.