Score: 1

Generalizing Beyond Suboptimality: Offline Reinforcement Learning Learns Effective Scheduling through Random Data

Published: September 12, 2025 | arXiv ID: 2509.10303v1

By: Jesse van Remmerden, Zaharah Bukhsh, Yingqian Zhang

Potential Business Impact:

Teaches factories to make things faster.

Business Areas:

Scheduling Information Technology, Software

The Job-Shop Scheduling Problem (JSP) and Flexible Job-Shop Scheduling Problem (FJSP), are canonical combinatorial optimization problems with wide-ranging applications in industrial operations. In recent years, many online reinforcement learning (RL) approaches have been proposed to learn constructive heuristics for JSP and FJSP. Although effective, these online RL methods require millions of interactions with simulated environments that may not capture real-world complexities, and their random policy initialization leads to poor sample efficiency. To address these limitations, we introduce Conservative Discrete Quantile Actor-Critic (CDQAC), a novel offline RL algorithm that learns effective scheduling policies directly from historical data, eliminating the need for costly online interactions, while maintaining the ability to improve upon suboptimal training data. CDQAC couples a quantile-based critic with a delayed policy update, estimating the return distribution of each machine-operation pair rather than selecting pairs outright. Our extensive experiments demonstrate CDQAC's remarkable ability to learn from diverse data sources. CDQAC consistently outperforms the original data-generating heuristics and surpasses state-of-the-art offline and online RL baselines. In addition, CDQAC is highly sample efficient, requiring only 10-20 training instances to learn high-quality policies. Surprisingly, we find that CDQAC performs better when trained on data generated by a random heuristic than when trained on higher-quality data from genetic algorithms and priority dispatching rules.

Policy-Based Deep Reinforcement Learning Hyperheuristics for Job-Shop Scheduling Problems

Artificial Intelligence

Teaches computers to finish jobs faster.

16 Jan 2026 0

88%

A Production Scheduling Framework for Reinforcement Learning Under Real-World Constraints

Machine Learning (CS)

Helps factories make things faster and better.

16 Jun 2025 1

88%

Policy-Based Reinforcement Learning with Action Masking for Dynamic Job Shop Scheduling under Uncertainty: Handling Random Arrivals and Machine Failures

Artificial Intelligence

Helps factories make things faster, even when problems happen.

14 Jan 2026 0

View PDF Login to Bookmark

Country of Origin

🇳🇱 Netherlands

Page Count

24 pages

Generalizing Beyond Suboptimality: Offline Reinforcement Learning Learns Effective Scheduling through Random Data

Teaches factories to make things faster.

Technical Abstract

Policy-Based Deep Reinforcement Learning Hyperheuristics for Job-Shop Scheduling Problems

A Production Scheduling Framework for Reinforcement Learning Under Real-World Constraints

Policy-Based Reinforcement Learning with Action Masking for Dynamic Job Shop Scheduling under Uncertainty: Handling Random Arrivals and Machine Failures