A probabilistic foundation model for crystal structure denoising, phase classification, and order parameters
By: Hyuna Kwon , Babak Sadigh , Sebastien Hamel and more
Atomistic simulations generate large volumes of noisy structural data, but extracting phase labels, order parameters (OPs), and defect information in a way that is universal, robust, and interpretable remains challenging. Existing tools such as PTM and CNA are restricted to a small set of hand-crafted lattices (e.g.\ FCC/BCC/HCP), degrade under strong thermal disorder or defects, and produce hard, template-based labels without per-atom probability or confidence scores. Here we introduce a log-probability foundation model that unifies denoising, phase classification, and OP extraction within a single probabilistic framework. We reuse the MACE-MP foundation interatomic potential on crystal structures mapped to AFLOW prototypes, training it to predict per-atom, per-phase logits $l$ and to aggregate them into a global log-density $\log \hat{P}_θ(\boldsymbol{r})$ whose gradient defines a conservative score field. Denoising corresponds to gradient ascent on this learned log-density, phase labels follow from $\arg\max_c l_{ac}$, and the $l$ values act as continuous, defect-sensitive and interpretable OPs quantifying the Euclidean distance to ideal phases. We demonstrate universality across hundreds of prototypes, robustness under strong thermal and defect-induced disorder, and accurate treatment of complex systems such as ice polymorphs, ice--water interfaces, and shock-compressed Ti.
Similar Papers
Deep learning denoising unlocks quantitative insights in operando materials microscopy
CV and Pattern Recognition
Cleans up blurry science pictures for better understanding.
Foundation Model for Polycrystalline Material Informatics
Computational Engineering, Finance, and Science
Teaches computers to predict how materials will behave.
FastCSP: Accelerated Molecular Crystal Structure Prediction with Universal Model for Atoms
Chemical Physics
Finds new crystal shapes much faster.