Score: 0

An Empirical Study on the Effectiveness of Incorporating Offline RL As Online RL Subroutines

Published: November 29, 2025 | arXiv ID: 2512.00383v1

By: Jianhai Su, Jinzhu Luo, Qi Zhang

Potential Business Impact:

Teaches robots to learn faster from past mistakes.

Business Areas:

Machine Learning Artificial Intelligence, Data and Analytics, Software

We take the novel perspective of incorporating offline RL algorithms as subroutines of tabula rasa online RL. This is feasible because an online learning agent can repurpose its historical interactions as offline dataset. We formalize this idea into a framework that accommodates several variants of offline RL incorporation such as final policy recommendation and online fine-tuning. We further introduce convenient techniques to improve its effectiveness in enhancing online learning efficiency. Our extensive and systematic empirical analyses show that 1) the effectiveness of the proposed framework depends strongly on the nature of the task, 2) our proposed techniques greatly enhance its effectiveness, and 3) existing online fine-tuning methods are overall ineffective, calling for more research therein.

A Tutorial: An Intuitive Explanation of Offline Reinforcement Learning Theory

Machine Learning (CS)

Teaches computers to learn from old data.

11 Aug 2025 0

88%

Using Non-Expert Data to Robustify Imitation Learning via Offline Reinforcement Learning

Robotics

Teaches robots to learn from bad examples.

22 Oct 2025 1

88%

Using Non-Expert Data to Robustify Imitation Learning via Offline Reinforcement Learning

Robotics

Teaches robots to learn from mistakes and play.

22 Oct 2025 1

View PDF Login to Bookmark

Page Count

21 pages

An Empirical Study on the Effectiveness of Incorporating Offline RL As Online RL Subroutines

Teaches robots to learn faster from past mistakes.

Technical Abstract

A Tutorial: An Intuitive Explanation of Offline Reinforcement Learning Theory

Using Non-Expert Data to Robustify Imitation Learning via Offline Reinforcement Learning

Using Non-Expert Data to Robustify Imitation Learning via Offline Reinforcement Learning