Active Learning Strategies for Efficient Machine-Learned Interatomic Potentials Across Diverse Material Systems
By: Mohammed Azeez Khan, Aaron D'Souza, Vijay Choyal
Potential Business Impact:
Finds new materials faster with smart computer guesses.
Efficient discovery of new materials demands strategies to reduce the number of costly first-principles calculations required to train predictive machine learning models. We develop and validate an active learning framework that iteratively selects informative training structures for machine-learned interatomic potentials (MLIPs) from large, heterogeneous materials databases, specifically the Materials Project and OQMD. Our framework integrates compositional and property-based descriptors with a neural network ensemble model, enabling real-time uncertainty quantification via Query-by-Committee. We systematically compare four selection strategies: random sampling (baseline), uncertainty-based sampling, diversity-based sampling (k-means clustering with farthest-point refinement), and a hybrid approach balancing both objectives. Experiments across four representative material systems (elemental carbon, silicon, iron, and a titanium-oxide compound) with 5 random seeds per configuration demonstrate that diversity sampling consistently achieves competitive or superior performance, with particularly strong advantages on complex systems like titanium-oxide (10.9% improvement, p=0.008). Our results show that intelligent data selection strategies can achieve target accuracy with 5-13% fewer labeled samples compared to random baselines. The entire pipeline executes on Google Colab in under 4 hours per system using less than 8 GB of RAM, thereby democratizing MLIP development for researchers globally with limited computational resources. Our open-source code and detailed experimental configurations are available on GitHub. This multi-system evaluation establishes practical guidelines for data-efficient MLIP training and highlights promising future directions including integration with symmetry-aware neural network architectures.
Similar Papers
How accurate are foundational machine learning interatomic potentials for heterogeneous catalysis?
Materials Science
Helps computers predict how atoms will behave.
Surface Stability Modeling with Universal Machine Learning Interatomic Potentials: A Comprehensive Cleavage Energy Benchmarking Study
Materials Science
Predicts how materials break, faster and better.
Efficient and Accurate Spatial Mixing of Machine Learned Interatomic Potentials for Materials Science
Materials Science
Speeds up computer models of materials without losing accuracy.