Discovering Mathematical Equations with Diffusion Language Model
By: Xiaoxu Han , Chengzhen Ning , Jinghui Zhong and more
Potential Business Impact:
Finds math rules from numbers and science.
Discovering valid and meaningful mathematical equations from observed data plays a crucial role in scientific discovery. While this task, symbolic regression, remains challenging due to the vast search space and the trade-off between accuracy and complexity. In this paper, we introduce DiffuSR, a pre-training framework for symbolic regression built upon a continuous-state diffusion language model. DiffuSR employs a trainable embedding layer within the diffusion process to map discrete mathematical symbols into a continuous latent space, modeling equation distributions effectively. Through iterative denoising, DiffuSR converts an initial noisy sequence into a symbolic equation, guided by numerical data injected via a cross-attention mechanism. We also design an effective inference strategy to enhance the accuracy of the diffusion-based equation generator, which injects logit priors into genetic programming. Experimental results on standard symbolic regression benchmarks demonstrate that DiffuSR achieves competitive performance with state-of-the-art autoregressive methods and generates more interpretable and diverse mathematical expressions.
Similar Papers
DrSR: LLM based Scientific Equation Discovery with Dual Reasoning from Data and Experience
Machine Learning (CS)
Finds science rules from data better.
Finetuning Large Language Model as an Effective Symbolic Regressor
Computational Engineering, Finance, and Science
Finds science rules from data, better than before.
Discovering equations from data: symbolic regression in dynamical systems
Machine Learning (CS)
Finds hidden math rules in nature's patterns.