Score: 2

Efficiently Vectorized MCMC on Modern Accelerators

Published: March 20, 2025 | arXiv ID: 2503.17405v2

By: Hugh Dance , Pierre Glaser , Peter Orbanz and more

Potential Business Impact:

Makes computer programs run much faster.

Business Areas:
A/B Testing Data and Analytics

With the advent of automatic vectorization tools (e.g., JAX's $\texttt{vmap}$), writing multi-chain MCMC algorithms is often now as simple as invoking those tools on single-chain code. Whilst convenient, for various MCMC algorithms this results in a synchronization problem -- loosely speaking, at each iteration all chains running in parallel must wait until the last chain has finished drawing its sample. In this work, we show how to design single-chain MCMC algorithms in a way that avoids synchronization overheads when vectorizing with tools like $\texttt{vmap}$ by using the framework of finite state machines (FSMs). Using a simplified model, we derive an exact theoretical form of the obtainable speed-ups using our approach, and use it to make principled recommendations for optimal algorithm design. We implement several popular MCMC algorithms as FSMs, including Elliptical Slice Sampling, HMC-NUTS, and Delayed Rejection, demonstrating speed-ups of up to an order of magnitude in experiments.

Country of Origin
🇬🇧 United Kingdom

Repos / Data Links

Page Count
23 pages

Category
Computer Science:
Mathematical Software