Score: 1

Branch, or Layer? Zeroth-Order Optimization for Continual Learning of Vision-Language Models

Published: June 14, 2025 | arXiv ID: 2506.12409v1

By: Ziwei Liu , Borui Kang , Wei Li and more

Potential Business Impact:

Learns new things without forgetting old ones.

Business Areas:

A/B Testing Data and Analytics

Continual learning in vision-language models (VLMs) faces critical challenges in balancing parameter efficiency, memory consumption, and optimization stability. While First-Order (FO) optimization (e.g., SGD) dominate current approaches, their deterministic gradients often trap models in suboptimal local minima and incur substantial memory overhead. This paper pioneers a systematic exploration of Zeroth-Order (ZO) optimization for vision-language continual learning (VLCL). We first identify the incompatibility of naive full-ZO adoption in VLCL due to modality-specific instability. To resolve this, we selectively applying ZO to either vision or language modalities while retaining FO in the complementary branch. Furthermore, we develop a layer-wise optimization paradigm that interleaves ZO and FO across network layers, capitalizing on the heterogeneous learning dynamics of shallow versus deep representations. A key theoretical insight reveals that ZO perturbations in vision branches exhibit higher variance than language counterparts, prompting a gradient sign normalization mechanism with modality-specific perturbation constraints. Extensive experiments on four benchmarks demonstrate that our method achieves state-of-the-art performance, reducing memory consumption by 89.1% compared to baselines. Code will be available upon publication.

A Unified Perspective on Optimization in Machine Learning and Neuroscience: From Gradient Descent to Neural Adaptation

Machine Learning (CS)

Makes AI learn faster and use less energy.

21 Oct 2025 0

89%

ZO2: Scalable Zeroth-Order Fine-Tuning for Extremely Large Language Models with Limited GPU Memory

Machine Learning (CS)

Lets huge AI models train on small computers.

16 Mar 2025 1

89%

Visualising Policy-Reward Interplay to Inform Zeroth-Order Preference Optimisation of Large Language Models

Computation and Language

Teaches AI to write better with less computing power.

5 Mar 2025 2

View PDF Login to Bookmark

Page Count

14 pages

Branch, or Layer? Zeroth-Order Optimization for Continual Learning of Vision-Language Models

Learns new things without forgetting old ones.

Technical Abstract

A Unified Perspective on Optimization in Machine Learning and Neuroscience: From Gradient Descent to Neural Adaptation

ZO2: Scalable Zeroth-Order Fine-Tuning for Extremely Large Language Models with Limited GPU Memory

Visualising Policy-Reward Interplay to Inform Zeroth-Order Preference Optimisation of Large Language Models