Preference Learning from Physics-Based Feedback: Tuning Language Models to Design BCC/B2 Superalloys
By: Satanu Ghosh , Collin Holgate , Neal R. Brodnik and more
Potential Business Impact:
Creates new super-strong metals using AI.
We apply preference learning to the task of language model-guided design of novel structural alloys. In contrast to prior work that focuses on generating stable inorganic crystals, our approach targets the synthesizeability of a specific structural class: BCC/B2 superalloys, an underexplored family of materials with potential applications in extreme environments. Using three open-weight models (LLaMA-3.1, Gemma-2, and OLMo-2), we demonstrate that language models can be optimized for multiple design objectives using a single, unified reward signal through Direct Preference Optimization (DPO). Unlike prior approaches that rely on heuristic or human-in-the-loop feedback (costly), our reward signal is derived from thermodynamic phase calculations, offering a scientifically grounded criterion for model tuning. To our knowledge, this is the first demonstration of preference-tuning a language model using physics-grounded feedback for structural alloy design. The resulting framework is general and extensible, providing a path forward for intelligent design-space exploration across a range of physical science domains.
Similar Papers
When Data is the Algorithm: A Systematic Study and Curation of Preference Optimization Datasets
Computation and Language
Makes AI understand what you like better.
Alignment as Distribution Learning: Your Preference Model is Explicitly a Language Model
Machine Learning (CS)
Makes AI better at following instructions.
Improving LLMs for Machine Translation Using Synthetic Preference Data
Computation and Language
Makes computer translations much better and more accurate.