Datasets for Custom-trained Machine-learning Interatomic Potentials: Nitric Acid Aqueous Solution

Dinpajooh, Mohammadhasan; LaCount, Michael D; Muller, Scott E; Henson, Neil J; Mejia Rodriguez, Daniel; Gomez, Axel; Mundy, Christopher J; Ritzmann, Andrew M

doi:10.25584/3004762

Title: Datasets for Custom-trained Machine-learning Interatomic Potentials: Nitric Acid Aqueous Solution

Dataset
Other Related Research

Abstract

This dataset was generated using an iterative active learning strategy with the ArcaNN software package (https://github.com/arcann-chem/arcann_training) to train machine-learning interatomic potentials (MLIPs) for aqueous nitric acid. Each active-learning cycle consisted of three stages: (1) training, (2) exploration, and (3) labeling. The initial training set comprised approximately 800 randomly selected configurations from a previous study by Lewis et al. (https://doi.org/10.1021/jp205510q), which investigated nitric acid solutions at 2, 3, 4, and 5 mol/L. For all configurations, single-point calculations of atomic forces and total energies were performed at the quantum density functional theory BLYP-D2 and PBE-D3 levels of theory using the CP2K Quickstep module. Valence electrons were treated explicitly, while core electrons on all atoms were represented by norm-conserving Goedecker–Teter–Hutter (GTH) pseudopotentials. Long-range dispersion interactions were accounted for using Grimme dispersion corrections. Wave functions were expanded in a mixed Gaussian-and-plane-wave scheme using TZV2P-MOLOPT basis sets for all elements and an 800 Ry auxiliary plane-wave cutoff for the electron density. Self-consistent field convergence was accelerated using orbital transformation and Direct Inversion in the Iterative Subspace, with a convergence threshold of 10^{-6}. All single-point calculations were carried out in periodic orthorhombic cells whose dimensions match those of the molecular configurations sampled from earlier trajectories. The CELL_REFmore »« less

Authors:

Dinpajooh, Mohammadhasan; LaCount, Michael D; Muller, Scott E;

;

; Gomez, Axel; Mundy, Christopher J;

Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)

Publication Date:: Thu Nov 20 23:00:00 EST 2025

DOE Contract Number:: AC05-76RL01830

Research Org.:: PNNL (PNNL2)

Sponsoring Org.:: USDOE Office of Science (SC), Basic Energy Sciences (BES)

OSTI Identifier:: 3004762

DOI:: https://doi.org/10.25584/3004762

Citation Formats


                    Dinpajooh, Mohammadhasan, LaCount, Michael D, Muller, Scott E, Henson, Neil J, Mejia Rodriguez, Daniel, Gomez, Axel, Mundy, Christopher J, and Ritzmann, Andrew M. Datasets for Custom-trained Machine-learning Interatomic Potentials: Nitric Acid Aqueous Solution.  United States: N. p., 2025. 
        Web.  doi:10.25584/3004762.

Copy to clipboard


                    Dinpajooh, Mohammadhasan, LaCount, Michael D, Muller, Scott E, Henson, Neil J, Mejia Rodriguez, Daniel, Gomez, Axel, Mundy, Christopher J, & Ritzmann, Andrew M. Datasets for Custom-trained Machine-learning Interatomic Potentials: Nitric Acid Aqueous Solution.  United States.  doi:https://doi.org/10.25584/3004762

Copy to clipboard


                    Dinpajooh, Mohammadhasan, LaCount, Michael D, Muller, Scott E, Henson, Neil J, Mejia Rodriguez, Daniel, Gomez, Axel, Mundy, Christopher J, and Ritzmann, Andrew M. 2025.  
"Datasets for Custom-trained Machine-learning Interatomic Potentials: Nitric Acid Aqueous Solution".  United States.  doi:https://doi.org/10.25584/3004762.  https://www.osti.gov/servlets/purl/3004762. Pub date:Thu Nov 20 23:00:00 EST 2025

Copy to clipboard


                    
@article{osti_3004762,

  title        = {Datasets for Custom-trained Machine-learning Interatomic Potentials: Nitric Acid Aqueous Solution},

  author       = {Dinpajooh, Mohammadhasan and LaCount, Michael D and Muller, Scott E and Henson, Neil J and Mejia Rodriguez, Daniel and Gomez, Axel and Mundy, Christopher J and Ritzmann, Andrew M},

  abstractNote = {This dataset was generated using an iterative active learning strategy with the ArcaNN software package (https://github.com/arcann-chem/arcann_training) to train machine-learning interatomic potentials (MLIPs) for aqueous nitric acid. Each active-learning cycle consisted of three stages: (1) training, (2) exploration, and (3) labeling. The initial training set comprised approximately 800 randomly selected configurations from a previous study by Lewis et al. (https://doi.org/10.1021/jp205510q), which investigated nitric acid solutions at 2, 3, 4, and 5 mol/L. For all configurations, single-point calculations of atomic forces and total energies were performed at the quantum density functional theory BLYP-D2 and PBE-D3 levels of theory using the CP2K Quickstep module. Valence electrons were treated explicitly, while core electrons on all atoms were represented by norm-conserving Goedecker–Teter–Hutter (GTH) pseudopotentials. Long-range dispersion interactions were accounted for using Grimme dispersion corrections. Wave functions were expanded in a mixed Gaussian-and-plane-wave scheme using TZV2P-MOLOPT basis sets for all elements and an 800 Ry auxiliary plane-wave cutoff for the electron density. Self-consistent field convergence was accelerated using orbital transformation and Direct Inversion in the Iterative Subspace, with a convergence threshold of 10^{-6}. All single-point calculations were carried out in periodic orthorhombic cells whose dimensions match those of the molecular configurations sampled from earlier trajectories. The CELL_REF keyword in CP2K was used to define a fixed reference cell, ensuring consistency in the reference data used for MLIP training, particularly when cell fluctuations are present in NpT simulations. The resulting high-fidelity energies and forces constitute the ground-truth labels used to train the MLIPs contained in this dataset.},

  doi          = {10.25584/3004762},

  journal      = {},

  number       = ,

  volume       = ,

  place        = {United States},

  year         = {Thu Nov 20 23:00:00 EST 2025},

  month        = {Thu Nov 20 23:00:00 EST 2025}

}

Copy to clipboard

Dataset:

View Dataset

DOI: https://doi.org/10.25584/3004762

Save / Share:

Export Metadata

Save to My Library

Similar records in DOE Data Explorer and OSTI.GOV collections:

Dataset, Code, and Models for Training Deep Learning Potentials for Low Temperature Plasma-Surface Interactions

Dataset Draney, Jack S. ; Panagiotopoulos, Athanassios ; Graves, David

This repository contains datasets, training scripts, and finished models, and test simulations used in the development of DeepREBO— a machine-learned interatomic potential trained to emulate the REBO2 empirical potential. The data was generated to study deep potential development for simulations of plasma-surface interactions. It uses an active learning framework, starting from a minimal dataset and iteratively expanding it. Included are those generated datasets, the trained models, and simulations used to evaluate the performance of the training process. This resource supports reproducibility and provides a reference framework for training deep potentials in plasma-surface interaction studies.
Scalable Solutions for Training Machine Learned Interatomic Potentials.

Conference Wood, Mitchell ; Sievers, Charles ; Perez, Danny ; ...

Abstract not provided.
Scalable Solutions for Training Machine Learned Interatomic Potentials.

Conference Wood, Mitchell ; Sievers, Charles ; Perez, Danny ; ...

Abstract not provided.
Formation of Zinc Carbonate Phases on Dissolving Calcite, Aragonite, and Vaterite in Acidic Aqueous Solutions (publication dataset)

Dataset Kim, YoungJae ; Lee, Sang Soo ; Abdilla, Bektur ; ...
FitSNAP : Scalable Solutions for Training Machine Learned Interatomic Potentials.

Conference Wood, Mitchell

Abstract not provided.

Similar Records