Stochastic Variational Propagation: Local, Scalable and Efficient Alternative to Backpropagation
By: Bojian Yin, Federico Corradi
Potential Business Impact:
Makes AI learn faster with less memory.
Backpropagation (BP) is the cornerstone of deep learning, but its reliance on global gradient synchronization limits scalability and imposes significant memory overhead. We propose Stochastic Variational Propagation (SVP), a scalable alternative that reframes training as hierarchical variational inference. SVP treats layer activations as latent variables and optimizes local Evidence Lower Bounds (ELBOs), enabling independent, local updates while preserving global coherence. However, directly applying KL divergence in layer-wise ELBOs risks inter-layer's representation collapse due to excessive compression. To prevent this, SVP projects activations into low-dimensional spaces via fixed random matrices, ensuring information preservation and representational diversity. Combined with a feature alignment loss for inter-layer consistency, SVP achieves competitive accuracy with BP across diverse architectures (MLPs, CNNs, Transformers) and datasets (MNIST to ImageNet), reduces memory usage by up to 4x, and significantly improves scalability. More broadly, SVP introduces a probabilistic perspective to deep representation learning, opening pathways toward more modular and interpretable neural network design.
Similar Papers
VIKING: Deep variational inference with stochastic projections
Machine Learning (Stat)
Makes smart computer programs more accurate and reliable.
Accelerated Execution of Bayesian Neural Networks using a Single Probabilistic Forward Pass and Code Generation
Machine Learning (CS)
Makes AI safer by knowing when it's wrong.
VISP: Volatility Informed Stochastic Projection for Adaptive Regularization
Machine Learning (CS)
Makes computer brains learn better by adding smart noise.