Compressing CNN models for resource-constrained systems by channel and layer pruning
By: Ahmed Sadaqa, Di Liu
Potential Business Impact:
Makes smart computer programs smaller and faster.
Convolutional Neural Networks (CNNs) have achieved significant breakthroughs in various fields. However, these advancements have led to a substantial increase in the complexity and size of these networks. This poses a challenge when deploying large and complex networks on edge devices. Consequently, model compression has emerged as a research field aimed at reducing the size and complexity of CNNs. One prominent technique in model compression is model pruning. This paper will present a new technique of pruning that combines both channel and layer pruning in what is called a "hybrid pruning framework". Inspired by EfficientNet, a renowned CNN architecture known for scaling up networks from both channel and layer perspectives, this hybrid approach applies the same principles but in reverse, where it scales down the network through pruning. Experiments on the hybrid approach demonstrated a notable decrease in the overall complexity of the model, with only a minimal reduction in accuracy compared to the baseline model. This complexity reduction translates into reduced latency when deploying the pruned models on an NVIDIA JETSON TX2 embedded AI device.
Similar Papers
Integrating Pruning with Quantization for Efficient Deep Neural Networks Compression
Neural and Evolutionary Computing
Makes smart computer programs smaller and faster.
Pruning Everything, Everywhere, All at Once
CV and Pattern Recognition
Makes smart computer programs smaller and faster.
Towards Adaptive Deep Learning: Model Elasticity via Prune-and-Grow CNN Architectures
Machine Learning (CS)
Lets smart programs use less power on phones.