Quantum-Inspired Episode Selection for Monte Carlo Reinforcement Learning via QUBO Optimization
By: Hadi Salloum , Ali Jnadi , Yaroslav Kholodov and more
Potential Business Impact:
Teaches computers to learn faster from fewer tries.
Monte Carlo (MC) reinforcement learning suffers from high sample complexity, especially in environments with sparse rewards, large state spaces, and correlated trajectories. We address these limitations by reformulating episode selection as a Quadratic Unconstrained Binary Optimization (QUBO) problem and solving it with quantum-inspired samplers. Our method, MC+QUBO, integrates a combinatorial filtering step into standard MC policy evaluation: from each batch of trajectories, we select a subset that maximizes cumulative reward while promoting state-space coverage. This selection is encoded as a QUBO, where linear terms favor high-reward episodes and quadratic terms penalize redundancy. We explore both Simulated Quantum Annealing (SQA) and Simulated Bifurcation (SB) as black-box solvers within this framework. Experiments in a finite-horizon GridWorld demonstrate that MC+QUBO outperforms vanilla MC in convergence speed and final policy quality, highlighting the potential of quantum-inspired optimization as a decision-making subroutine in reinforcement learning.
Similar Papers
Demonstrating Real Advantage of Machine-Learning-Enhanced Monte Carlo for Combinatorial Optimization
Disordered Systems and Neural Networks
Finds best solutions to hard problems faster.
Quantum Annealing for Machine Learning: Applications in Feature Selection, Instance Selection, and Clustering
Quantum Physics
Quantum computers find better patterns in data faster.
Systematic and Efficient Construction of Quadratic Unconstrained Binary Optimization Forms for High-order and Dense Interactions
Quantum Physics
Solves hard math problems for smarter computer learning.