Score: 0

Simulating MLB Seasons using Bayesian Inference and Random Walks

Published: May 8, 2025 | arXiv ID: 2505.05120v1

By: Simon Cha

Potential Business Impact:

Predicts baseball team wins and playoff chances.

Business Areas:
A/B Testing Data and Analytics

As a dedicated follower of sports statistics and with the MLB season beginning in late March, I set out to predict how many wins each team would accumulate by the end of the 162 game season. The goal was to build a simulation framework capable of forecasting the remainder of the season, starting from a 20 game burn-in period to establish initial estimates of team strength. My approach used a Bayesian inference model incorporating team win percentage, batting average, and pitching ERA to construct a posterior distribution of win probability for each matchup. For each game, I sampled from the posterior and simulated the outcome using a Bernoulli trial. Because future matchup inputs were unobserved, I forecasted batting averages using random walks and modeled pitching ERA with Kalman filters. After simulating many seasons, the model produced a distribution of win totals for all 30 teams and can also be used to estimate each team's probability of making the postseason.

Page Count
3 pages

Category
Statistics:
Applications