Abstract
Machine learning (ML) of interatomic potentials show great promise to accelerate scientific simulation, e.g., by emulating expensive computations at a high accuracy but much reduced computational cost. Training datasets are calculated from computationally expensive ab initio quantum mechanics methods, density functional theory (DFT). Trained on this data, an ML model can be very successful in predicting energy and forces for new atomic configurations. A critical factor is the quality and diversity of the training dataset. Thus, a highly automated approach to dataset construction based on active learning framework is designed suitable for material physics.
The active learning scheme begins with fully randomized atomic configurations. Then, many Molecular Dynamics (MD) trajectories are simulated using current ML potentials, where each MD trajectory is initialized to a random disordered configuration. The temperature is varied in order to diversify the sampled configuration during these simulations. The variance of predictions for eight neural networks within an ensemble is analyzed to determine whether the model is operating as expected. This helps in determining whether collecting more data would be helpful to the model by checking the ensemble variance is greater than the threshold. In this case, the MD trajectory is terminated and the final atomic configuration is
More>>
- Developers:
- Release Date:
- 2023-01-04
- Project Type:
- Open Source, Publicly Available Repository
- Software Type:
- Scientific
- Licenses:
-
BSD 3-clause "New" or "Revised" License
- Sponsoring Org.:
-
USDOE Laboratory Directed Research and Development (LDRD) ProgramPrimary Award/Contract Number:AC52-06NA25396
- Code ID:
- 115537
- Site Accession Number:
- C22072
- Research Org.:
- Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
- Country of Origin:
- United States
Citation Formats
Nebgen, Benjamin, Smith, Justin, Lubbers, Nicholas, Barros, Kipton, Li, Yin Wai, and Li, Wei.
Active Learning Framework.
Computer Software.
https://github.com/lanl/alf.
USDOE Laboratory Directed Research and Development (LDRD) Program.
04 Jan. 2023.
Web.
doi:10.11578/dc.20231103.4.
Nebgen, Benjamin, Smith, Justin, Lubbers, Nicholas, Barros, Kipton, Li, Yin Wai, & Li, Wei.
(2023, January 04).
Active Learning Framework.
[Computer software].
https://github.com/lanl/alf.
https://doi.org/10.11578/dc.20231103.4.
Nebgen, Benjamin, Smith, Justin, Lubbers, Nicholas, Barros, Kipton, Li, Yin Wai, and Li, Wei.
"Active Learning Framework." Computer software.
January 04, 2023.
https://github.com/lanl/alf.
https://doi.org/10.11578/dc.20231103.4.
@misc{
doecode_115537,
title = {Active Learning Framework},
author = {Nebgen, Benjamin and Smith, Justin and Lubbers, Nicholas and Barros, Kipton and Li, Yin Wai and Li, Wei},
abstractNote = {Machine learning (ML) of interatomic potentials show great promise to accelerate scientific simulation, e.g., by emulating expensive computations at a high accuracy but much reduced computational cost. Training datasets are calculated from computationally expensive ab initio quantum mechanics methods, density functional theory (DFT). Trained on this data, an ML model can be very successful in predicting energy and forces for new atomic configurations. A critical factor is the quality and diversity of the training dataset. Thus, a highly automated approach to dataset construction based on active learning framework is designed suitable for material physics.
The active learning scheme begins with fully randomized atomic configurations. Then, many Molecular Dynamics (MD) trajectories are simulated using current ML potentials, where each MD trajectory is initialized to a random disordered configuration. The temperature is varied in order to diversify the sampled configuration during these simulations. The variance of predictions for eight neural networks within an ensemble is analyzed to determine whether the model is operating as expected. This helps in determining whether collecting more data would be helpful to the model by checking the ensemble variance is greater than the threshold. In this case, the MD trajectory is terminated and the final atomic configuration is placed on a queue (SQL database) for DFT calculations and added to training dataset. Periodically, ML model is retrained to the updated training model. This Active Learning loop is iterated until the cost of MD simulations becomes prohibitively expensive. The MD simulations will hopefully be sufficiently robust to support nucleation after many active learning iterations. In this sense, active learning scheme must automatically discover the important low energy and nonequilibrium physics.},
doi = {10.11578/dc.20231103.4},
url = {https://doi.org/10.11578/dc.20231103.4},
howpublished = {[Computer Software] \url{https://doi.org/10.11578/dc.20231103.4}},
year = {2023},
month = {jan}
}