DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Model-free estimation of completeness, uncertainties, and outliers in atomistic machine learning using information theory

Journal Article · · Nature Communications

Abstract An accurate description of information is relevant for a range of problems in atomistic machine learning (ML), such as crafting training sets, performing uncertainty quantification (UQ), or extracting physical insights from large datasets. However, atomistic ML often relies on unsupervised learning or model predictions to analyze information contents from simulation or training data. Here, we introduce a theoretical framework that provides a rigorous, model-free tool to quantify information contents in atomistic simulations. We demonstrate that the information entropy of a distribution of atom-centered environments explains known heuristics in ML potential developments, from training set sizes to dataset optimality. Using this tool, we propose a model-free UQ method that reliably predicts epistemic uncertainty and detects out-of-distribution samples, including rare events in systems such as nucleation. This method provides a general tool for data-driven atomistic modeling and combines efforts in ML, simulations, and physical explainability.

Research Organization:
Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States); University of California, Los Angeles, CA (United States)
Sponsoring Organization:
USDOE; USDOE Laboratory Directed Research and Development (LDRD) Program; USDOE Office of Science (SC), Basic Energy Sciences (BES)
Grant/Contract Number:
AC52-07NA27344; SC0025642
OSTI ID:
2563460
Report Number(s):
LLNL--JRNL-862887; LLNL--LDRD 22-ERD-055; LLNL--LDRD 23-SI-006; 4014; PII: 59232
Journal Information:
Nature Communications, Journal Name: Nature Communications Journal Issue: 1 Vol. 16; ISSN 2041-1723
Publisher:
Nature Publishing GroupCopyright Statement
Country of Publication:
United Kingdom
Language:
English

References (54)

A Mathematical Theory of Communication journal July 1948
Constructing high-dimensional neural network potentials: A tutorial review journal March 2015
Correlations between Surface and Interface Energies with Respect to Crystal Nucleation journal November 2002
Life, information theory, and topology journal September 1955
Correlation entropy in a classical liquid journal June 1987
Systematic analysis of local atomic structure combined with 3D computer graphics journal March 1994
A coherent set of model equations for various surface and interface energies in systems with liquid and solid metals and alloys journal September 2020
LAMMPS - a flexible simulation tool for particle-based materials modeling at the atomic, meso, and continuum scales journal February 2022
Spectral neighbor analysis method for automated generation of quantum-accurate interatomic potentials journal March 2015
Neural Network Potential Energy Surfaces for Small Molecules and Reactions journal October 2020
Machine Learning Force Fields journal March 2021
Probing the limits of metal plasticity with molecular dynamics simulations journal September 2017
Automated discovery of a robust interatomic potential for aluminum journal February 2021
Differentiable sampling of molecular geometries with uncertainty-based adversarial attacks journal August 2021
E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials journal May 2022
Exploiting redundancy in large materials datasets for efficient machine learning with less data journal November 2023
De novo exploration and self-guided learning of potential-energy surfaces journal October 2019
On-the-fly active learning of interpretable Bayesian force fields for atomistic rare events journal March 2020
Training data selection for accuracy and transferability of interatomic potentials journal September 2022
Single-model uncertainty quantification in neural network potentials does not consistently outperform model ensembles journal December 2023
Robust training of machine learning interatomic potentials with dimensionality reduction and stratified sampling journal February 2024
Complexity of many-body interactions in transition metals via machine-learned force fields from the TM23 data set journal May 2024
Scaling deep learning for materials discovery journal November 2023
SciPy 1.0: fundamental algorithms for scientific computing in Python journal February 2020
A universal graph deep learning interatomic potential for the periodic table journal November 2022
Inorganic synthesis-structure maps in zeolites with machine learning and crystallographic distances journal January 2023
Comprehensive sampling of coverage effects in catalysis by leveraging generalization in neural network models journal January 2025
Formation of Crystal Nuclei in Liquid Metals journal October 1950
Extraction of configurational entropy from molecular simulations via an expansion approximation journal July 2007
How to quantify energy landscapes of solids journal March 2009
A unified formulation of the constant temperature molecular dynamics methods journal July 1984
Extraction of effective solid-liquid interfacial free energies for full 3D solid crystallites from equilibrium MD simulations journal November 2017
Machine learning for interatomic potential models journal February 2020
An accurate and transferable machine learning potential for carbon journal July 2020
An entropy-maximization approach to automated training set generation for interatomic potentials journal September 2020
Fast uncertainty estimates in deep learning interatomic potentials journal April 2023
Molecular-dynamics study of solid–liquid interface migration in fcc metals journal September 2010
Structure identification methods for atomistic simulations of crystalline materials journal May 2012
Information Theory and Statistical Mechanics journal May 1957
Canonical dynamics: Equilibrium phase-space distributions journal March 1985
Direct entropy calculation from computer simulation of liquids journal October 1989
Structural stability and lattice defects in copper: Ab initio , tight-binding, and embedded-atom calculations journal May 2001
Embedded-atom-method tantalum potential developed by the force-matching method journal March 2003
Efficient nonparametric n -body force fields from machine learning journal May 2018
Information entropy of complex structures journal November 1997
Gaussian Approximation Potentials: The Accuracy of Quantum Mechanics, without the Electrons journal April 2010
Deep Potential Molecular Dynamics: A Scalable Model with the Accuracy of Quantum Mechanics journal April 2018
Incompleteness of Atomic Structure Representations journal October 2020
Calculating Accurate Free Energies of Solids Directly from Simulations journal February 1995
Configurational Entropy of Network-Forming Materials journal July 2002
Generalized Neural-Network Representation of High-Dimensional Potential-Energy Surfaces journal April 2007
Machine learning of accurate energy-conserving molecular force fields journal May 2017
Numba: a LLVM-based Python JIT compiler conference January 2015
Vibrational Entropy of Crystalline Solids from Covariance of Atomic Displacements journal April 2022