Bayesian Symbolic Regression via Posterior Sampling
By: Geoffrey F. Bomarito, Patrick E. Leser
Potential Business Impact:
Finds hidden math rules, even with messy data.
Symbolic regression is a powerful tool for discovering governing equations directly from data, but its sensitivity to noise hinders its broader application. This paper introduces a Sequential Monte Carlo (SMC) framework for Bayesian symbolic regression that approximates the posterior distribution over symbolic expressions, enhancing robustness and enabling uncertainty quantification for symbolic regression in the presence of noise. Differing from traditional genetic programming approaches, the SMC-based algorithm combines probabilistic selection, adaptive tempering, and the use of normalized marginal likelihood to efficiently explore the search space of symbolic expressions, yielding parsimonious expressions with improved generalization. When compared to standard genetic programming baselines, the proposed method better deals with challenging, noisy benchmark datasets. The reduced tendency to overfit and enhanced ability to discover accurate and interpretable equations paves the way for more robust symbolic regression in scientific discovery and engineering design applications.
Similar Papers
Hierarchical Bayesian Operator-induced Symbolic Regression Trees for Structural Learning of Scientific Expressions
Methodology
Finds science rules from messy data.
Discovering equations from data: symbolic regression in dynamical systems
Machine Learning (CS)
Finds hidden math rules in nature's patterns.
Towards symbolic regression for interpretable clinical decision scores
Machine Learning (CS)
Creates easy-to-understand doctor rules from patient data.