Datasets for Custom-trained Machine-learning Interatomic Potentials: Nitric Acid Aqueous Solution
Abstract
This dataset was generated using an iterative active learning strategy with the ArcaNN software package (https://github.com/arcann-chem/arcann_training) to train machine-learning interatomic potentials (MLIPs) for aqueous nitric acid. Each active-learning cycle consisted of three stages: (1) training, (2) exploration, and (3) labeling. The initial training set comprised approximately 800 randomly selected configurations from a previous study by Lewis et al. (https://doi.org/10.1021/jp205510q), which investigated nitric acid solutions at 2, 3, 4, and 5 mol/L. For all configurations, single-point calculations of atomic forces and total energies were performed at the quantum density functional theory BLYP-D2 and PBE-D3 levels of theory using the CP2K Quickstep module. Valence electrons were treated explicitly, while core electrons on all atoms were represented by norm-conserving Goedecker–Teter–Hutter (GTH) pseudopotentials. Long-range dispersion interactions were accounted for using Grimme dispersion corrections. Wave functions were expanded in a mixed Gaussian-and-plane-wave scheme using TZV2P-MOLOPT basis sets for all elements and an 800 Ry auxiliary plane-wave cutoff for the electron density. Self-consistent field convergence was accelerated using orbital transformation and Direct Inversion in the Iterative Subspace, with a convergence threshold of 10^{-6}. All single-point calculations were carried out in periodic orthorhombic cells whose dimensions match those of the molecular configurations sampled from earlier trajectories. The CELL_REFmore »
- Authors:
-
- Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
- Publication Date:
- DOE Contract Number:
- AC05-76RL01830
- Research Org.:
- PNNL (PNNL2)
- Sponsoring Org.:
- USDOE Office of Science (SC), Basic Energy Sciences (BES)
- OSTI Identifier:
- 3004762
- DOI:
- https://doi.org/10.25584/3004762
Citation Formats
Dinpajooh, Mohammadhasan, LaCount, Michael D, Muller, Scott E, Henson, Neil J, Mejia Rodriguez, Daniel, Gomez, Axel, Mundy, Christopher J, and Ritzmann, Andrew M. Datasets for Custom-trained Machine-learning Interatomic Potentials: Nitric Acid Aqueous Solution. United States: N. p., 2025.
Web. doi:10.25584/3004762.
Dinpajooh, Mohammadhasan, LaCount, Michael D, Muller, Scott E, Henson, Neil J, Mejia Rodriguez, Daniel, Gomez, Axel, Mundy, Christopher J, & Ritzmann, Andrew M. Datasets for Custom-trained Machine-learning Interatomic Potentials: Nitric Acid Aqueous Solution. United States. doi:https://doi.org/10.25584/3004762
Dinpajooh, Mohammadhasan, LaCount, Michael D, Muller, Scott E, Henson, Neil J, Mejia Rodriguez, Daniel, Gomez, Axel, Mundy, Christopher J, and Ritzmann, Andrew M. 2025.
"Datasets for Custom-trained Machine-learning Interatomic Potentials: Nitric Acid Aqueous Solution". United States. doi:https://doi.org/10.25584/3004762. https://www.osti.gov/servlets/purl/3004762. Pub date:Thu Nov 20 23:00:00 EST 2025
@article{osti_3004762,
title = {Datasets for Custom-trained Machine-learning Interatomic Potentials: Nitric Acid Aqueous Solution},
author = {Dinpajooh, Mohammadhasan and LaCount, Michael D and Muller, Scott E and Henson, Neil J and Mejia Rodriguez, Daniel and Gomez, Axel and Mundy, Christopher J and Ritzmann, Andrew M},
abstractNote = {This dataset was generated using an iterative active learning strategy with the ArcaNN software package (https://github.com/arcann-chem/arcann_training) to train machine-learning interatomic potentials (MLIPs) for aqueous nitric acid. Each active-learning cycle consisted of three stages: (1) training, (2) exploration, and (3) labeling. The initial training set comprised approximately 800 randomly selected configurations from a previous study by Lewis et al. (https://doi.org/10.1021/jp205510q), which investigated nitric acid solutions at 2, 3, 4, and 5 mol/L. For all configurations, single-point calculations of atomic forces and total energies were performed at the quantum density functional theory BLYP-D2 and PBE-D3 levels of theory using the CP2K Quickstep module. Valence electrons were treated explicitly, while core electrons on all atoms were represented by norm-conserving Goedecker–Teter–Hutter (GTH) pseudopotentials. Long-range dispersion interactions were accounted for using Grimme dispersion corrections. Wave functions were expanded in a mixed Gaussian-and-plane-wave scheme using TZV2P-MOLOPT basis sets for all elements and an 800 Ry auxiliary plane-wave cutoff for the electron density. Self-consistent field convergence was accelerated using orbital transformation and Direct Inversion in the Iterative Subspace, with a convergence threshold of 10^{-6}. All single-point calculations were carried out in periodic orthorhombic cells whose dimensions match those of the molecular configurations sampled from earlier trajectories. The CELL_REF keyword in CP2K was used to define a fixed reference cell, ensuring consistency in the reference data used for MLIP training, particularly when cell fluctuations are present in NpT simulations. The resulting high-fidelity energies and forces constitute the ground-truth labels used to train the MLIPs contained in this dataset.},
doi = {10.25584/3004762},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Thu Nov 20 23:00:00 EST 2025},
month = {Thu Nov 20 23:00:00 EST 2025}
}
