Action-Gradient Monte Carlo Tree Search for Non-Parametric Continuous (PO)MDPs
By: Idan Lev-Yehudi , Michael Novitsky , Moran Barenboim and more
Potential Business Impact:
Helps robots learn to make better choices.
Autonomous systems that operate in continuous state, action, and observation spaces require planning and reasoning under uncertainty. Existing online planning methods for such POMDPs are almost exclusively sample-based, yet they forego the power of high-dimensional gradient optimization as combining it into Monte Carlo Tree Search (MCTS) has proved difficult, especially in non-parametric settings. We close this gap with three contributions. First, we derive a novel action-gradient theorem for both MDPs and POMDPs in terms of transition likelihoods, making gradient information accessible during tree search. Second, we introduce the Multiple Importance Sampling (MIS) tree, that re-uses samples for changing action branches, yielding consistent value estimates that enable in-search gradient steps. Third, we derive exact transition probability computation via the area formula for smooth generative models common in physical domains, a result of independent interest. These elements combine into Action-Gradient Monte Carlo Tree Search (AGMCTS), the first planner to blend non-parametric particle search with online gradient refinement in POMDPs. Across several challenging continuous MDP and POMDP benchmarks, AGMCTS outperforms widely-used sample-only solvers in solution quality.
Similar Papers
Gaussian Process Aggregation for Root-Parallel Monte Carlo Tree Search with Continuous Actions
Artificial Intelligence
Helps robots learn faster by guessing good moves.
Trust-Region Twisted Policy Improvement
Machine Learning (CS)
Makes computer games smarter and faster to learn.
Partially Observable Monte-Carlo Graph Search
Artificial Intelligence
Solves complex robot problems without real-time thinking.