LogicSparse: Enabling Engine-Free Unstructured Sparsity for Quantised Deep-learning Accelerators
By: Changhong Li, Biswajit Basu, Shreejith Shanker
Potential Business Impact:
Makes smart devices run faster using less power.
FPGAs have been shown to be a promising platform for deploying Quantised Neural Networks (QNNs) with high-speed, low-latency, and energy-efficient inference. However, the complexity of modern deep-learning models limits the performance on resource-constrained edge devices. While quantisation and pruning alleviate these challenges, unstructured sparsity remains underexploited due to irregular memory access. This work introduces a framework that embeds unstructured sparsity into dataflow accelerators, eliminating the need for dedicated sparse engines and preserving parallelism. A hardware-aware pruning strategy is introduced to improve efficiency and design flow further. On LeNet-5, the framework attains 51.6 x compression and 1.23 x throughput improvement using only 5.12% of LUTs, effectively exploiting unstructured sparsity for QNN acceleration.
Similar Papers
Hardware/Software Co-Design of RISC-V Extensions for Accelerating Sparse DNNs on FPGAs
Machine Learning (CS)
Makes AI run faster by skipping unneeded math.
Enabling Dynamic Sparsity in Quantized LLM Inference
Distributed, Parallel, and Cluster Computing
Makes smart computer programs run faster on phones.
SQ-format: A Unified Sparse-Quantized Hardware-friendly Data Format for LLMs
Computation and Language
Makes AI models run faster and smaller.