Design of A Low-Latency and Parallelizable SVD Dataflow Architecture on FPGA
By: Fangqiang Du , Sixuan Chong , Zixuan Huang and more
Potential Business Impact:
Speeds up computer analysis of big data streams.
Singular value decomposition (SVD) is widely used for dimensionality reduction and noise suppression, and it plays a pivotal role in numerous scientific and engineering applications. As the dimensions of the matrix grow rapidly, the computational cost increases significantly, posing a serious challenge to the efficiency of data analysis and signal processing systems, especially in time-sensitive scenarios involving large-scale datasets. Although various dedicated hardware architectures have been proposed to accelerate the computation of intensive SVD, many of these designs suffer from limited scalability and high consumption of on-chip memory resources. Moreover, they typically overlook the computational and data transfer challenges associated with SVD, making them unsuitable for real-time processing of large-scale data stream matrices in embedded systems. In this paper, we propose a Data Stream-Based SVD processing algorithm (DSB Jacobi), which significantly reduces on-chip BRAM usage while improving computational speed, offering a practical solution for real-time SVD computation of large-scale data streams. Compared to previous works, our experimental results indicate that the proposed method reduces on-chip RAM consumption by 41.5 percent and improves computational efficiency by a factor of 23.
Similar Papers
Design of A Low-Latency and Parallelizable SVD Dataflow Architecture on FPGA
Distributed, Parallel, and Cluster Computing
Makes big data analysis faster and uses less memory.
Design of A Low-Latency and Parallelizable SVD Dataflow Architecture on FPGA
Distributed, Parallel, and Cluster Computing
Makes big data math faster and uses less memory.
Efficient GPU-Centered Singular Value Decomposition Using the Divide-and-Conquer Method
Distributed, Parallel, and Cluster Computing
Makes computers find patterns in data much faster.