Enhancing Symbolic Regression with Quality-Diversity and Physics-Inspired Constraints
By: J. -P. Bruneton
Potential Business Impact:
Finds hidden math rules in data.
This paper presents QDSR, an advanced symbolic Regression (SR) system that integrates genetic programming (GP), a quality-diversity (QD) algorithm, and a dimensional analysis (DA) engine. Our method focuses on exact symbolic recovery of known expressions from datasets, with a particular emphasis on the Feynman-AI benchmark. On this widely used collection of 117 physics equations, QDSR achieves an exact recovery rate of 91.6~$\%$, surpassing all previous SR methods by over 20 percentage points. Our method also exhibits strong robustness to noise. Beyond QD and DA, this high success rate results from a profitable trade-off between vocabulary expressiveness and search space size: we show that significantly expanding the vocabulary with precomputed meaningful variables (e.g., dimensionless combinations and well-chosen scalar products) often reduces equation complexity, ultimately leading to better performance. Ablation studies will also show that QD alone already outperforms the state-of-the-art. This suggests that a simple integration of QD, by projecting individuals onto a QD grid, can significantly boost performance in existing algorithms, without requiring major system overhauls.
Similar Papers
Current Challenges of Symbolic Regression: Optimization, Selection, Model Simplification, and Benchmarking
Neural and Evolutionary Computing
Finds simpler math rules for better predictions.
Decomposable Neuro Symbolic Regression
Machine Learning (CS)
Finds simple math rules for complex data.
Domain-Informed Genetic Superposition Programming: A Case Study on SFRC Beams
Neural and Evolutionary Computing
Finds hidden rules in how things are built.