Score: 1

Improving the Straight-Through Estimator with Zeroth-Order Information

Published: October 27, 2025 | arXiv ID: 2510.23926v1

By: Ningfeng Yang, Tor M. Aamodt

Potential Business Impact:

Makes AI learn faster and better with less effort.

Business Areas:

Quantum Computing Science and Engineering

We study the problem of training neural networks with quantized parameters. Learning low-precision quantized parameters by enabling computation of gradients via the Straight-Through Estimator (STE) can be challenging. While the STE enables back-propagation, which is a first-order method, recent works have explored the use of zeroth-order (ZO) gradient descent for fine-tuning. We note that the STE provides high-quality biased gradients, and ZO gradients are unbiased but can be expensive. We thus propose First-Order-Guided Zeroth-Order Gradient Descent (FOGZO) that reduces STE bias while reducing computations relative to ZO methods. Empirically, we show FOGZO improves the tradeoff between quality and training time in Quantization-Aware Pre-Training. Specifically, versus STE at the same number of iterations, we show a 1-8\% accuracy improvement for DeiT Tiny/Small, 1-2\% accuracy improvement on ResNet 18/50, and 1-22 perplexity point improvement for LLaMA models with up to 0.3 billion parameters. For the same loss, FOGZO yields a 796$\times$ reduction in computation versus n-SPSA for a 2-layer MLP on MNIST. Code is available at https://github.com/1733116199/fogzo.

Perturbation-efficient Zeroth-order Optimization for Hardware-friendly On-device Training

Machine Learning (CS)

Makes AI learn faster on small devices.

28 Apr 2025 0

87%

TeZO: Empowering the Low-Rankness on the Temporal Dimension in the Zeroth-Order Optimization for Fine-tuning LLMs

Machine Learning (CS)

Makes AI learn faster with less computer power.

31 Jan 2025 1

87%

High-Dimensional Learning Dynamics of Quantized Models with Straight-Through Estimator

Machine Learning (Stat)

Makes computer learning faster and more accurate.

12 Oct 2025 1

View PDF Login to Bookmark

Country of Origin

🇨🇦 Canada

Repos / Data Links

github.com github.com

Page Count

33 pages

Improving the Straight-Through Estimator with Zeroth-Order Information

Makes AI learn faster and better with less effort.

Technical Abstract

Perturbation-efficient Zeroth-order Optimization for Hardware-friendly On-device Training

TeZO: Empowering the Low-Rankness on the Temporal Dimension in the Zeroth-Order Optimization for Fine-tuning LLMs

High-Dimensional Learning Dynamics of Quantized Models with Straight-Through Estimator