Score: 0

Thermodynamic Prediction Enabled by Automatic Dataset Building and Machine Learning

Published: July 9, 2025 | arXiv ID: 2507.07293v1

By: Juejing Liu , Haydn Anderson , Noah I. Waxman and more

Potential Business Impact:

Computers read science papers, predict new materials.

Business Areas:
Machine Learning Artificial Intelligence, Data and Analytics, Software

New discoveries in chemistry and materials science, with increasingly expanding volume of requisite knowledge and experimental workload, provide unique opportunities for machine learning (ML) to take critical roles in accelerating research efficiency. Here, we demonstrate (1) the use of large language models (LLMs) for automated literature reviews, and (2) the training of an ML model to predict chemical knowledge (thermodynamic parameters). Our LLM-based literature review tool (LMExt) successfully extracted chemical information and beyond into a machine-readable structure, including stability constants for metal cation-ligand interactions, thermodynamic properties, and other broader data types (medical research papers, and financial reports), effectively overcoming the challenges inherent in each domain. Using the autonomous acquisition of thermodynamic data, an ML model was trained using the CatBoost algorithm for accurately predicting thermodynamic parameters (e.g., enthalpy of formation) of minerals. This work highlights the transformative potential of integrated ML approaches to reshape chemistry and materials science research.

Country of Origin
🇺🇸 United States

Page Count
21 pages

Category
Condensed Matter:
Materials Science