Neural Networks with Orthogonal Jacobian
By: Alex Massucco, Davide Murari, Carola-Bibiane Schönlieb
Potential Business Impact:
Makes deep computer brains learn much faster.
Very deep neural networks achieve state-of-the-art performance by extracting rich, hierarchical features. Yet, training them via backpropagation is often hindered by vanishing or exploding gradients. Existing remedies, such as orthogonal or variance-preserving initialisation and residual architectures, allow for a more stable gradient propagation and the training of deeper models. In this work, we introduce a unified mathematical framework that describes a broad class of nonlinear feedforward and residual networks, whose input-to-output Jacobian matrices are exactly orthogonal almost everywhere. Such a constraint forces the resulting networks to achieve perfect dynamical isometry and train efficiently despite being very deep. Our formulation not only recovers standard architectures as particular cases but also yields new designs that match the trainability of residual networks without relying on conventional skip connections. We provide experimental evidence that perfect Jacobian orthogonality at initialisation is sufficient to stabilise training and achieve competitive performance. We compare this strategy to networks regularised to maintain the Jacobian orthogonality and obtain comparable results. We further extend our analysis to a class of networks well-approximated by those with orthogonal Jacobians and introduce networks with Jacobians representing partial isometries. These generalized models are then showed to maintain the favourable trainability properties.
Similar Papers
On the Stability of the Jacobian Matrix in Deep Neural Networks
Machine Learning (CS)
Makes smart computer programs learn better.
Shortcut Invariance: Targeted Jacobian Regularization in Disentangled Latent Space
Machine Learning (CS)
Makes AI ignore fake clues and learn better.
Almost Right: Making First-layer Kernels Nearly Orthogonal Improves Model Generalization
CV and Pattern Recognition
Helps computers learn new things better.