Forecasting MBTA Transit Dynamics: A Performance Benchmarking of Statistical and Machine Learning Models
By: Sai Siddharth Nalamalpu , Kaining Yuan , Aiden Zhou and more
Potential Business Impact:
Predicts train delays and riders better than weather.
The Massachusetts Bay Transportation Authority (MBTA) is the main public transit provider in Boston, operating multiple means of transport, including trains, subways, and buses. However, the system often faces delays and fluctuations in ridership volume, which negatively affect efficiency and passenger satisfaction. To further understand this phenomenon, this paper compares the performance of existing and unique methods to determine the best approach in predicting gated station entries in the subway system (a proxy for subway usage) and the number of delays in the overall MBTA system. To do so, this research considers factors that tend to affect public transportation, such as day of week, season, pressure, wind speed, average temperature, and precipitation. This paper evaluates the performance of 10 statistical and machine learning models on predicting next-day subway usage. On predicting delay count, the number of models is extended to 11 per day by introducing a self-exciting point process model, representing a unique application of a point-process framework for MBTA delay modeling. This research involves experimenting with the selective inclusion of features to determine feature importance, testing model accuracy via Root Mean Squared Error (RMSE). Remarkably, it is found that providing either day of week or season data has a more substantial benefit to predictive accuracy compared to weather data; in fact, providing weather data generally worsens performance, suggesting a tendency of models to overfit.
Similar Papers
Mixed-Effects Modeling of NYC Subway Ridership Using MTA and Weather Data
Applications
Wind makes fewer people ride the subway.
How does the Performance of the Data-driven Traffic Flow Forecasting Models deteriorate with Increasing Forecasting Horizon? An Extensive Approach Considering Statistical, Machine Learning and Deep Learning Models
Machine Learning (CS)
Predicts traffic jams before they happen.
Real-time Bus Travel Time Prediction and Reliability Quantification: A Hybrid Markov Model
Applications
Predicts bus arrival times more accurately, even with delays.