High Utilization Energy-Aware Real-Time Inference Deep Convolutional Neural Network Accelerator
By: Kuan-Ting Lin , Ching-Te Chiu , Jheng-Yi Chang and more
Potential Business Impact:
Makes smart cameras work faster and use less power.
Deep convolution Neural Network (DCNN) has been widely used in computer vision tasks. However, for edge devices even inference has too large computational complexity and data access amount. The inference latency of state-of-the-art models are impractical for real-world applications. In this paper, we propose a high utilization energy-aware real-time inference deep convolutional neural network accelerator, which improves the performance of the current accelerators. First, we use the 1x1 size convolution kernel as the smallest unit of the computing unit. Then we design suitable computing unit based on the requirements of each model. Secondly, we use Reuse Feature SRAM to store the output of the current layer in the chip and use the value as the input of the next layer. Moreover, we import Output Reuse Strategy and Ring Stream Dataflow to reduce the amount of data exchange between chips and DRAM. Finally, we present On-fly Pooling Module to let the calculation of the Pooling layer directly complete in the chip. With the aid of the proposed method, the implemented acceleration chip has an extremely high hardware utilization rate. We reduce a generous amount of data transfer on the specific module, ECNN. Compared to the methods without reuse strategy, we can reduce 533 times of data access amount. At the same time, we have enough computing power to perform real-time execution of the existing image classification model, VGG16 and MobileNet. Compared with the design in VWA, we can speed up 7.52 times and have 1.92x energy efficiency
Similar Papers
A Time- and Energy-Efficient CNN with Dense Connections on Memristor-Based Chips
Hardware Architecture
Makes AI chips faster and use less power.
RISC-V Based TinyML Accelerator for Depthwise Separable Convolutions in Edge AI
Hardware Architecture
Makes smart devices run faster and use less power.
Flexible Vector Integration in Embedded RISC-V SoCs for End to End CNN Inference Acceleration
Distributed, Parallel, and Cluster Computing
Makes smart devices run AI faster and use less power.