Deep Delta Learning
By: Yifan Zhang , Yifeng Liu , Mengdi Wang and more
Potential Business Impact:
Teaches computers to learn better by changing how they remember.
The efficacy of deep residual networks is fundamentally predicated on the identity shortcut connection. While this mechanism effectively mitigates the vanishing gradient problem, it imposes a strictly additive inductive bias on feature transformations, thereby limiting the network's capacity to model complex state transitions. In this paper, we introduce Deep Delta Learning (DDL), a novel architecture that generalizes the standard residual connection by modulating the identity shortcut with a learnable, data-dependent geometric transformation. This transformation, termed the Delta Operator, constitutes a rank-1 perturbation of the identity matrix, parameterized by a reflection direction vector $\mathbf{k}(\mathbf{X})$ and a gating scalar $β(\mathbf{X})$. We provide a spectral analysis of this operator, demonstrating that the gate $β(\mathbf{X})$ enables dynamic interpolation between identity mapping, orthogonal projection, and geometric reflection. Furthermore, we restructure the residual update as a synchronous rank-1 injection, where the gate acts as a dynamic step size governing both the erasure of old information and the writing of new features. This unification empowers the network to explicitly control the spectrum of its layer-wise transition operator, enabling the modeling of complex, non-monotonic dynamics while preserving the stable training characteristics of gated residual architectures.
Similar Papers
LDLT $\mathcal{L}$-Lipschitz Network: Generalized Deep End-To-End Lipschitz Network Construction
Machine Learning (CS)
Makes computer brains safer from being tricked.
Geometric and Dynamic Scaling in Deep Transformers
Machine Learning (CS)
Makes super-deep computer brains learn better.
DeltaProduct: Improving State-Tracking in Linear RNNs via Householder Products
Machine Learning (CS)
Makes AI remember longer and learn better.