EDEA: Efficient Dual-Engine Accelerator for Depthwise Separable Convolution with Direct Data Transfer
By: Yi Chen , Jie Lou , Malte Wabnitz and more
Potential Business Impact:
Makes phones smarter, faster, and use less power.
Depthwise separable convolution (DSC) has emerged as a crucial technique, especially for resource-constrained devices. In this paper, we propose a dual-engine for the DSC hardware accelerator, which enables the full utilization of depthwise convolution (DWC) and pointwise convolution (PWC) processing elements (PEs) in all DSC layers. To determine the optimal dataflow, data reuse, and configuration of the target architecture, we conduct a design space exploration using MobileNetV1 with the CIFAR10 dataset. In the architecture, we introduce an additional non-convolutional unit, which merges the dequantization, batch normalization (BN), ReLU, and quantization between DWC and PWC into a simple fixed-point multiplication and addition operation. This also reduces the intermediate data access between the DWC and PWC, enabling streaming operation and reducing latency. The proposed DSC dual-engine accelerator is implemented using the 22nm FDSOI technology from GlobalFoundries, occupying an area of 0.58 $mm^2$. After signoff, it can operate at 1 GHz at TT corner, achieving a peak energy efficiency of 13.43 TOPS/W with a throughput of 973.55 GOPS with 8-bit precision. The average energy efficiency of all DSC layers on MobileNetV1 is 11.13 TOPS/W, demonstrating substantial hardware efficiency improvements for DSC-based applications.
Similar Papers
RISC-V Based TinyML Accelerator for Depthwise Separable Convolutions in Edge AI
Hardware Architecture
Makes smart devices run faster and use less power.
DiffAxE: Diffusion-driven Hardware Accelerator Generation and Design Space Exploration
Hardware Architecture
Finds best computer chips for AI faster.
High Utilization Energy-Aware Real-Time Inference Deep Convolutional Neural Network Accelerator
Hardware Architecture
Makes smart cameras work faster and use less power.