Data-Efficient Symbolic Regression via Foundation Model Distillation
By: Wangyang Ying , Jinghan Zhang , Haoyue Bai and more
Potential Business Impact:
Finds hidden science rules from few examples.
Discovering interpretable mathematical equations from observed data (a.k.a. equation discovery or symbolic regression) is a cornerstone of scientific discovery, enabling transparent modeling of physical, biological, and economic systems. While foundation models pre-trained on large-scale equation datasets offer a promising starting point, they often suffer from negative transfer and poor generalization when applied to small, domain-specific datasets. In this paper, we introduce EQUATE (Equation Generation via QUality-Aligned Transfer Embeddings), a data-efficient fine-tuning framework that adapts foundation models for symbolic equation discovery in low-data regimes via distillation. EQUATE combines symbolic-numeric alignment with evaluator-guided embedding optimization, enabling a principled embedding-search-generation paradigm. Our approach reformulates discrete equation search as a continuous optimization task in a shared embedding space, guided by data-equation fitness and simplicity. Experiments across three standard public benchmarks (Feynman, Strogatz, and black-box datasets) demonstrate that EQUATE consistently outperforms state-of-the-art baselines in both accuracy and robustness, while preserving low complexity and fast inference. These results highlight EQUATE as a practical and generalizable solution for data-efficient symbolic regression in foundation model distillation settings.
Similar Papers
Towards Fast Coarse-graining and Equation Discovery with Foundation Inference Models
Machine Learning (CS)
Finds hidden patterns in moving pictures.
Advancing Symbolic Discovery on Unsupervised Data: A Pre-training Framework for Non-degenerate Implicit Equation Discovery
Symbolic Computation
Finds hidden math rules in messy science data.
Discovering Mathematical Equations with Diffusion Language Model
Machine Learning (CS)
Finds math rules from numbers and science.