Score: 1

High Utilization Energy-Aware Real-Time Inference Deep Convolutional Neural Network Accelerator

Published: September 6, 2025 | arXiv ID: 2509.05688v1

By: Kuan-Ting Lin , Ching-Te Chiu , Jheng-Yi Chang and more

Potential Business Impact:

Makes smart cameras work faster and use less power.

Business Areas:

Image Recognition Data and Analytics, Software

Deep convolution Neural Network (DCNN) has been widely used in computer vision tasks. However, for edge devices even inference has too large computational complexity and data access amount. The inference latency of state-of-the-art models are impractical for real-world applications. In this paper, we propose a high utilization energy-aware real-time inference deep convolutional neural network accelerator, which improves the performance of the current accelerators. First, we use the 1x1 size convolution kernel as the smallest unit of the computing unit. Then we design suitable computing unit based on the requirements of each model. Secondly, we use Reuse Feature SRAM to store the output of the current layer in the chip and use the value as the input of the next layer. Moreover, we import Output Reuse Strategy and Ring Stream Dataflow to reduce the amount of data exchange between chips and DRAM. Finally, we present On-fly Pooling Module to let the calculation of the Pooling layer directly complete in the chip. With the aid of the proposed method, the implemented acceleration chip has an extremely high hardware utilization rate. We reduce a generous amount of data transfer on the specific module, ECNN. Compared to the methods without reuse strategy, we can reduce 533 times of data access amount. At the same time, we have enough computing power to perform real-time execution of the existing image classification model, VGG16 and MobileNet. Compared with the design in VWA, we can speed up 7.52 times and have 1.92x energy efficiency

A Time- and Energy-Efficient CNN with Dense Connections on Memristor-Based Chips

Hardware Architecture

Makes AI chips faster and use less power.

17 Aug 2025 0

89%

RISC-V Based TinyML Accelerator for Depthwise Separable Convolutions in Edge AI

Hardware Architecture

Makes smart devices run faster and use less power.

26 Nov 2025 0

88%

Flexible Vector Integration in Embedded RISC-V SoCs for End to End CNN Inference Acceleration

Distributed, Parallel, and Cluster Computing

Makes smart devices run AI faster and use less power.

19 Jul 2025 0

View PDF Login to Bookmark

Country of Origin

🇹🇼 Taiwan, Province of China

Page Count

13 pages

High Utilization Energy-Aware Real-Time Inference Deep Convolutional Neural Network Accelerator

Makes smart cameras work faster and use less power.

Technical Abstract

A Time- and Energy-Efficient CNN with Dense Connections on Memristor-Based Chips

RISC-V Based TinyML Accelerator for Depthwise Separable Convolutions in Edge AI

Flexible Vector Integration in Embedded RISC-V SoCs for End to End CNN Inference Acceleration