Reinforcement Learning Methods for Neighborhood Selection in Local Search
By: Yannick Molinghen , Augustin Delecluse , Renaud De Landtsheer and more
Reinforcement learning has recently gained traction as a means to improve combinatorial optimization methods, yet its effectiveness within local search metaheuristics specifically remains comparatively underexamined. In this study, we evaluate a range of reinforcement learning-based neighborhood selection strategies -- multi-armed bandits (upper confidence bound, $ε$-greedy) and deep reinforcement learning methods (proximal policy optimization, double deep $Q$-network) -- and compare them against multiple baselines across three different problems: the traveling salesman problem, the pickup and delivery problem with time windows, and the car sequencing problem. We show how search-specific characteristics, particularly large variations in cost due to constraint violation penalties, necessitate carefully designed reward functions to provide stable and informative learning signals. Our extensive experiments reveal that algorithm performance varies substantially across problems, although that $ε$-greedy consistently ranks among the best performers. In contrast, the computational overhead of deep reinforcement learning approaches only makes them competitive with a substantially longer runtime. These findings highlight both the promise and the practical limitations of deep reinforcement learning in local search.
Similar Papers
An exploration for higher efficiency in multi objective optimisation with reinforcement learning
Artificial Intelligence
Teaches computers to solve hard problems better.
Heuristics for Combinatorial Optimization via Value-based Reinforcement Learning: A Unified Framework and Analysis
Machine Learning (Stat)
Helps computers solve hard puzzles faster and better.
Random is Faster than Systematic in Multi-Objective Local Search
Neural and Evolutionary Computing
Randomly checking options finds better solutions faster.