Hyperflux: Pruning Reveals the Importance of Weights
By: Eugen Barbulescu, Antonio Alexoaie, Lucian Busoniu
Potential Business Impact:
Makes smart computer programs smaller and faster.
Network pruning is used to reduce inference latency and power consumption in large neural networks. However, most existing methods use ad-hoc heuristics, lacking much insight and justified mainly by empirical results. We introduce Hyperflux, a conceptually grounded L0 pruning approach that estimates each weight's importance through its flux, the gradient's response to the weight's removal. A global pressure term continuously drives all weights toward pruning, with those critical for accuracy being automatically regrown based on their flux. We postulate several properties that naturally follow from our framework and experimentally validate each of them. One such property is the relationship between final sparsity and pressure, for which we derive a generalized scaling-law equation that is used to design our sparsity-controlling scheduler. Empirically, we demonstrate state-of-the-art results with ResNet-50 and VGG-19 on CIFAR-10 and CIFAR-100.
Similar Papers
Efficient Column-Wise N:M Pruning on RISC-V CPU
Distributed, Parallel, and Cluster Computing
Makes AI faster by shrinking its brain.
Neural expressiveness for beyond importance model compression
Machine Learning (CS)
Makes computer programs smaller and faster.
Synaptic Pruning: A Biological Inspiration for Deep Learning Regularization
Machine Learning (CS)
Makes computer brains learn smarter and faster.