Score: 0

Design of A Low-Latency and Parallelizable SVD Dataflow Architecture on FPGA

Published: November 16, 2025 | arXiv ID: 2511.12461v3

By: Fangqiang Du , Sixuan Chong , Zixuan Huang and more

Potential Business Impact:

Speeds up computer analysis of big data streams.

Business Areas:

DSP Hardware

Singular value decomposition (SVD) is widely used for dimensionality reduction and noise suppression, and it plays a pivotal role in numerous scientific and engineering applications. As the dimensions of the matrix grow rapidly, the computational cost increases significantly, posing a serious challenge to the efficiency of data analysis and signal processing systems, especially in time-sensitive scenarios involving large-scale datasets. Although various dedicated hardware architectures have been proposed to accelerate the computation of intensive SVD, many of these designs suffer from limited scalability and high consumption of on-chip memory resources. Moreover, they typically overlook the computational and data transfer challenges associated with SVD, making them unsuitable for real-time processing of large-scale data stream matrices in embedded systems. In this paper, we propose a Data Stream-Based SVD processing algorithm (DSB Jacobi), which significantly reduces on-chip BRAM usage while improving computational speed, offering a practical solution for real-time SVD computation of large-scale data streams. Compared to previous works, our experimental results indicate that the proposed method reduces on-chip RAM consumption by 41.5 percent and improves computational efficiency by a factor of 23.