Predictable Gradient Manifolds in Deep Learning: Temporal Path-Length and Intrinsic Rank as a Complexity Regime
By: Anherutowa Calvo
Potential Business Impact:
Makes computer learning faster by predicting its steps.
Deep learning optimization exhibits structure that is not captured by worst-case gradient bounds. Empirically, gradients along training trajectories are often temporally predictable and evolve within a low-dimensional subspace. In this work we formalize this observation through a measurable framework for predictable gradient manifolds. We introduce two computable quantities: a prediction-based path length that measures how well gradients can be forecast from past information, and a predictable rank that quantifies the intrinsic temporal dimension of gradient increments. We show how classical online and nonconvex optimization guarantees can be restated so that convergence and regret depend explicitly on these quantities, rather than on worst-case variation. Across convolutional networks, vision transformers, language models, and synthetic control tasks, we find that gradient trajectories are locally predictable and exhibit strong low-rank structure over time. These properties are stable across architectures and optimizers, and can be diagnosed directly from logged gradients using lightweight random projections. Our results provide a unifying lens for understanding optimization dynamics in modern deep learning, reframing standard training as operating in a low-complexity temporal regime. This perspective suggests new directions for adaptive optimizers, rank-aware tracking, and prediction-based algorithm design grounded in measurable properties of real training runs.
Similar Papers
The Geometry of Abstraction: Continual Learning via Recursive Quotienting
Machine Learning (CS)
Helps computers remember everything without forgetting.
Learning Geometry: A Framework for Building Adaptive Manifold Models through Metric Optimization
Machine Learning (CS)
Teaches computers to learn by changing their shape.
Long-time dynamics and universality of nonconvex gradient descent
Machine Learning (CS)
Helps computers learn better, even with messy data.