Time-Scale Coupling Between States and Parameters in Recurrent Neural Networks
By: Lorenzo Livi
Potential Business Impact:
Makes computer learning faster and more stable.
We study how gating mechanisms in recurrent neural networks (RNNs) implicitly induce adaptive learning-rate behavior, even when training is carried out with a fixed, global learning rate. This effect arises from the coupling between state-space time scales--parametrized by the gates--and parameter-space dynamics during gradient descent. By deriving exact Jacobians for leaky-integrator and gated RNNs, we obtain a first-order expansion that makes explicit how constant, scalar, and multi-dimensional gates reshape gradient propagation, modulate effective step sizes, and introduce anisotropy in parameter updates. These findings reveal that gates not only control information flow, but also act as data-driven preconditioners that adapt optimization trajectories in parameter space. We further draw formal analogies with learning-rate schedules, momentum, and adaptive methods such as Adam. Empirical simulations corroborate these claims: in several sequence tasks, we show that gates induce lag-dependent effective learning rates and directional concentration of gradient flow, with multi-gate models matching or exceeding the anisotropic structure produced by Adam. These results highlight that optimizer-driven and gate-driven adaptivity are complementary but not equivalent mechanisms. Overall, this work provides a unified dynamical systems perspective on how gating couples state evolution with parameter updates, explaining why gated architectures achieve robust trainability and stability in practice.
Similar Papers
Mechanistic Interpretability of RNNs emulating Hidden Markov Models
Machine Learning (CS)
Makes brains learn to make choices randomly.
Delays in Spiking Neural Networks: A State Space Model Approach
Machine Learning (CS)
Lets brain-like computers remember past events.
On Biologically Plausible Learning in Continuous Time
Machine Learning (CS)
Makes brains learn faster by timing signals.