Score: 0

Modified Equations for Stochastic Optimization

Published: November 25, 2025 | arXiv ID: 2511.20322v1

By: Stefan Perko

Potential Business Impact:

Makes computer learning faster and more accurate.

Business Areas:

A/B Testing Data and Analytics

In this thesis, we extend the recently introduced theory of stochastic modified equations (SMEs) for stochastic gradient optimization algorithms. In Ch. 3 we study time-inhomogeneous SDEs driven by Brownian motion. For certain SDEs we prove a 1st and 2nd-order weak approximation properties, and we compute their linear error terms explicitly, under certain regularity conditions. In Ch. 4 we instantiate our results for SGD, working out the example of linear regression explicitly. We use this example to compare the linear error terms of gradient flow and two commonly used 1st-order SMEs for SGD in Ch. 5. In the second part of the thesis we introduce and study a novel diffusion approximation for SGD without replacement (SGDo) in the finite-data setting. In Ch. 6 we motivate and define the notion of an epoched Brownian motion (EBM). We argue that Young differential equations (YDEs) driven by EBMs serve as continuous-time models for SGDo for any shuffling scheme whose induced permutations converge to a det. permuton. Further, we prove a.s. convergence for these YDEs in the strongly convex setting. Moreover, we compute an upper asymptotic bound on the convergence rate which is as sharp as, or better than previous results for SGDo. In Ch. 7 we study scaling limits of families of random walks (RW) that share the same increments up to a random permutation. We show weak convergence under the assumption that the sequence of permutations converges to a det. (higher-dimensional) permuton. This permuton determines the covariance function of the limiting Gaussian process. Conversely, we show that every Gaussian process with a covariance function determined by a permuton in this way arises as a weak scaling limit of families of RW with shared increments. Finally, we apply our weak convergence theory to show that EBMs arise as scaling limits of RW with finitely many distinct increments.

Towards Continuous-Time Approximations for Stochastic Gradient Descent without Replacement

Machine Learning (CS)

Makes computer learning faster and more reliable.

4 Dec 2025 0

90%

Nonparametric learning of stochastic differential equations from sparse and noisy data

Machine Learning (Stat)

Learns how things change from messy, limited data.

15 Aug 2025 0

89%

Noise estimation of SDE from a single data trajectory

Statistical Finance

Finds hidden rules in messy, changing data.

29 Sep 2025 1

View PDF Login to Bookmark

Page Count

130 pages

Modified Equations for Stochastic Optimization

Makes computer learning faster and more accurate.

Technical Abstract

Towards Continuous-Time Approximations for Stochastic Gradient Descent without Replacement

Nonparametric learning of stochastic differential equations from sparse and noisy data

Noise estimation of SDE from a single data trajectory