Score: 2

Investigating Intra-Abstraction Policies For Non-exact Abstraction Algorithms

Published: October 28, 2025 | arXiv ID: 2510.24297v1

By: Robin Schmöcker, Alexander Dockhorn, Bodo Rosenhahn

Potential Business Impact:

Teaches computers to make better choices faster.

Business Areas:

A/B Testing Data and Analytics

One weakness of Monte Carlo Tree Search (MCTS) is its sample efficiency which can be addressed by building and using state and/or action abstractions in parallel to the tree search such that information can be shared among nodes of the same layer. The primary usage of abstractions for MCTS is to enhance the Upper Confidence Bound (UCB) value during the tree policy by aggregating visits and returns of an abstract node. However, this direct usage of abstractions does not take the case into account where multiple actions with the same parent might be in the same abstract node, as these would then all have the same UCB value, thus requiring a tiebreak rule. In state-of-the-art abstraction algorithms such as pruned On the Go Abstractions (pruned OGA), this case has not been noticed, and a random tiebreak rule was implicitly chosen. In this paper, we propose and empirically evaluate several alternative intra-abstraction policies, several of which outperform the random policy across a majority of environments and parameter settings.