Score: 1

ULTHO: Ultra-Lightweight yet Efficient Hyperparameter Optimization in Deep Reinforcement Learning

Published: March 8, 2025 | arXiv ID: 2503.06101v2

By: Mingqi Yuan , Bo Li , Xin Jin and more

Potential Business Impact:

Makes AI learn faster and better.

Business Areas:

A/B Testing Data and Analytics

Hyperparameter optimization (HPO) is a billion-dollar problem in machine learning, which significantly impacts the training efficiency and model performance. However, achieving efficient and robust HPO in deep reinforcement learning (RL) is consistently challenging due to its high non-stationarity and computational cost. To tackle this problem, existing approaches attempt to adapt common HPO techniques (e.g., population-based training or Bayesian optimization) to the RL scenario. However, they remain sample-inefficient and computationally expensive, which cannot facilitate a wide range of applications. In this paper, we propose ULTHO, an ultra-lightweight yet powerful framework for fast HPO in deep RL within single runs. Specifically, we formulate the HPO process as a multi-armed bandit with clustered arms (MABC) and link it directly to long-term return optimization. ULTHO also provides a quantified and statistical perspective to filter the HPs efficiently. We test ULTHO on benchmarks including ALE, Procgen, MiniGrid, and PyBullet. Extensive experiments demonstrate that the ULTHO can achieve superior performance with a simple architecture, contributing to the development of advanced and automated RL systems.

Hyperparameter Optimisation with Practical Interpretability and Explanation Methods in Probabilistic Curriculum Learning

Machine Learning (CS)

Makes computer learning faster and easier.

9 Apr 2025 0

86%

Grouped Sequential Optimization Strategy -- the Application of Hyperparameter Importance Assessment in Deep Learning

Machine Learning (CS)

Makes computer learning faster by finding best settings.

7 Mar 2025 0

86%

Iterated Population Based Training with Task-Agnostic Restarts

Machine Learning (CS)

**Finds best computer learning settings automatically.**

12 Nov 2025 1

View PDF Login to Bookmark

Repos / Data Links

github.com

Page Count

24 pages

ULTHO: Ultra-Lightweight yet Efficient Hyperparameter Optimization in Deep Reinforcement Learning

Makes AI learn faster and better.

Technical Abstract

Hyperparameter Optimisation with Practical Interpretability and Explanation Methods in Probabilistic Curriculum Learning

Grouped Sequential Optimization Strategy -- the Application of Hyperparameter Importance Assessment in Deep Learning

Iterated Population Based Training with Task-Agnostic Restarts