Score: 0

CORL: Reinforcement Learning of MILP Policies Solved via Branch and Bound

Published: December 11, 2025 | arXiv ID: 2512.11169v1

By: Akhil S Anand , Elias Aarekol , Martin Mziray Dalseg and more

Combinatorial sequential decision making problems are typically modeled as mixed integer linear programs (MILPs) and solved via branch and bound (B&B) algorithms. The inherent difficulty of modeling MILPs that accurately represent stochastic real world problems leads to suboptimal performance in the real world. Recently, machine learning methods have been applied to build MILP models for decision quality rather than how accurately they model the real world problem. However, these approaches typically rely on supervised learning, assume access to true optimal decisions, and use surrogates for the MILP gradients. In this work, we introduce a proof of concept CORL framework that end to end fine tunes an MILP scheme using reinforcement learning (RL) on real world data to maximize its operational performance. We enable this by casting an MILP solved by B&B as a differentiable stochastic policy compatible with RL. We validate the CORL method in a simple illustrative combinatorial sequential decision making example.

Planning in Branch-and-Bound: Model-Based Reinforcement Learning for Exact Combinatorial Optimization

Machine Learning (CS)

Teaches computers to solve hard problems faster.

12 Nov 2025 1

91%

A Markov Decision Process for Variable Selection in Branch & Bound

Machine Learning (CS)

Teaches computers to solve hard problems faster.

22 Oct 2025 2

90%

ReviBranch: Deep Reinforcement Learning for Branch-and-Bound with Revived Trajectories

Machine Learning (CS)

Teaches computers to solve hard math problems faster.

24 Aug 2025 0

View PDF Login to Bookmark

CORL: Reinforcement Learning of MILP Policies Solved via Branch and Bound

Technical Abstract

Planning in Branch-and-Bound: Model-Based Reinforcement Learning for Exact Combinatorial Optimization

A Markov Decision Process for Variable Selection in Branch & Bound

ReviBranch: Deep Reinforcement Learning for Branch-and-Bound with Revived Trajectories