Coding for Computation: Efficient Compression of Neural Networks for Reconfigurable Hardware
By: Hans Rosenberger , Rodrigo Fischer , Johanna S. Fröhlich and more
Potential Business Impact:
Makes smart computer programs run much faster.
As state of the art neural networks (NNs) continue to grow in size, their resource-efficient implementation becomes ever more important. In this paper, we introduce a compression scheme that reduces the number of computations required for NN inference on reconfigurable hardware such as FPGAs. This is achieved by combining pruning via regularized training, weight sharing and linear computation coding (LCC). Contrary to common NN compression techniques, where the objective is to reduce the memory used for storing the weights of the NNs, our approach is optimized to reduce the number of additions required for inference in a hardware-friendly manner. The proposed scheme achieves competitive performance for simple multilayer perceptrons, as well as for large scale deep NNs such as ResNet-34.
Similar Papers
COMponent-Aware Pruning for Accelerated Control Tasks in Latent Space Models
Robotics
Makes smart robots work with less power.
Neural Weight Compression for Language Models
Machine Learning (CS)
Makes AI models smaller and faster to use.
An Efficient Compression of Deep Neural Network Checkpoints Based on Prediction and Context Modeling
Machine Learning (CS)
Shrinks computer learning files to save space.