Score: 0

Adaptive Batch Size and Learning Rate Scheduler for Stochastic Gradient Descent Based on Minimization of Stochastic First-order Oracle Complexity

Published: August 7, 2025 | arXiv ID: 2508.05302v1

By: Hikaru Umeda, Hideaki Iiduka

Potential Business Impact:

Makes computer learning faster by changing settings.

The convergence behavior of mini-batch stochastic gradient descent (SGD) is highly sensitive to the batch size and learning rate settings. Recent theoretical studies have identified the existence of a critical batch size that minimizes stochastic first-order oracle (SFO) complexity, defined as the expected number of gradient evaluations required to reach a stationary point of the empirical loss function in a deep neural network. An adaptive scheduling strategy is introduced to accelerate SGD that leverages theoretical findings on the critical batch size. The batch size and learning rate are adjusted on the basis of the observed decay in the full gradient norm during training. Experiments using an adaptive joint scheduler based on this strategy demonstrated improved convergence speed compared with that of existing schedulers.

Optimal Growth Schedules for Batch Size and Learning Rate in SGD that Reduce SFO Complexity

Machine Learning (CS)

Makes computer learning faster and better.

7 Aug 2025 0

89%

Faster Convergence of Riemannian Stochastic Gradient Descent with Increasing Batch Size

Machine Learning (CS)

Makes computer learning faster and better.

30 Jan 2025 1

88%

Accelerating SGDM via Learning Rate and Batch Size Schedules: A Lyapunov-Based Analysis

Machine Learning (CS)

Makes computer learning faster and more reliable.

5 Aug 2025 0

View PDF Login to Bookmark

Country of Origin

🇯🇵 Japan

Page Count

8 pages

Adaptive Batch Size and Learning Rate Scheduler for Stochastic Gradient Descent Based on Minimization of Stochastic First-order Oracle Complexity

Makes computer learning faster by changing settings.

Technical Abstract

Optimal Growth Schedules for Batch Size and Learning Rate in SGD that Reduce SFO Complexity

Faster Convergence of Riemannian Stochastic Gradient Descent with Increasing Batch Size

Accelerating SGDM via Learning Rate and Batch Size Schedules: A Lyapunov-Based Analysis