skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for molecules

Journal Article · · Scientific Data

Abstract Maximum diversification of data is a central theme in building generalized and accurate machine learning (ML) models. In chemistry, ML has been used to develop models for predicting molecular properties, for example quantum mechanics (QM) calculated potential energy surfaces and atomic charge models. The ANI-1x and ANI-1ccx ML-based general-purpose potentials for organic molecules were developed through active learning; an automated data diversification process. Here, we describe the ANI-1x and ANI-1ccx data sets. To demonstrate data diversity, we visualize it with a dimensionality reduction scheme, and contrast against existing data sets. The ANI-1x data set contains multiple QM properties from 5 M density functional theory calculations, while the ANI-1ccx data set contains 500 k data points obtained with an accurate CCSD(T)/CBS extrapolation. Approximately 14 million CPU core-hours were expended to generate this data. Multiple QM calculated properties for the chemical elements C, H, N, and O are provided: energies, atomic forces, multipole moments, atomic charges, etc. We provide this data to the community to aid research and development of ML models for chemistry.

Research Organization:
Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
Sponsoring Organization:
USDOE Office of Science (SC); USDOE Laboratory Directed Research and Development (LDRD) Program; National Science Foundation (NSF)
Grant/Contract Number:
89233218CNA000001; N00014-16-1-2311; CHE-1802789; DMR110088; ACI-1053575; 1148698
OSTI ID:
1765938
Alternate ID(s):
OSTI ID: 1819139
Report Number(s):
LA-UR-19-29769; 134; PII: 473
Journal Information:
Scientific Data, Journal Name: Scientific Data Vol. 7 Journal Issue: 1; ISSN 2052-4463
Publisher:
Nature Publishing GroupCopyright Statement
Country of Publication:
United Kingdom
Language:
English

References (62)

Learning to fly by crashing conference September 2017
Comparison of permutationally invariant polynomials, neural networks, and Gaussian approximation potentials in representing water interactions through many-body expansions journal June 2018
Machine learning for molecular dynamics with strongly correlated electrons journal April 2019
Adversarial autoencoders with constant-curvature latent manifolds journal August 2019
ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost journal January 2017
The atomic simulation environment—a Python library for working with atoms journal June 2017
On representing chemical environments journal May 2013
Virtual Exploration of the Small-Molecule Chemical Universe below 160 Daltons journal February 2005
ANI-1, A data set of 20 million calculated off-equilibrium conformations for organic molecules journal December 2017
Less is more: Sampling chemical space with active learning journal June 2018
A neural network potential-energy surface for the water dimer based on environment-dependent atomic energies and charges journal February 2012
Accurate and transferable multitask prediction of chemical properties with an atoms-in-molecules neural network journal August 2019
Spectral neighbor analysis method for automated generation of quantum-accurate interatomic potentials journal March 2015
Constant size descriptors for accurate machine learning models of molecular properties journal June 2018
Machine learning of molecular electronic properties in chemical compound space journal September 2013
A Density Functional Tight Binding Layer for Deep Learning of Chemical Hamiltonians journal October 2018
Efficient DLPNO–CCSD(T)-Based Estimation of Formation Enthalpies for C-, H-, O-, and N-Containing Closed-Shell Compounds Validated Against Critically Evaluated Experimental Data journal May 2017
Solid harmonic wavelet scattering for predictions of molecule properties journal June 2018
SchNet – A deep learning architecture for molecules and materials journal June 2018
Transferability in Machine Learning for Electronic Structure via the Molecular Orbital Basis journal July 2018
Virtual Exploration of the Chemical Universe up to 11 Atoms of C, N, O, F:  Assembly of 26.4 Million Structures (110.9 Million Stereoisomers) and Analysis for New Ring Systems, Stereochemistry, Physicochemical Properties, Compound Classes, and Drug Discovery journal January 2007
Accelerating high-throughput searches for new alloys with active learning of interatomic potentials journal January 2019
Graph Networks as a Universal Machine Learning Framework for Molecules and Crystals journal April 2019
The open science grid journal July 2007
Intrinsic Bond Energies from a Bonds-in-Molecules Neural Network journal June 2017
Minimal Basis Iterative Stockholder: Atoms in Molecules for Force-Field Development journal July 2016
PhysNet: A Neural Network for Predicting Energies, Forces, Dipole Moments, and Partial Charges journal April 2019
Gaussian approximation potential modeling of lithium intercalation in carbon nanostructures journal June 2018
Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules journal January 2018
De novo exploration and self-guided learning of potential-energy surfaces journal October 2019
Machine Learning of Partial Charges Derived from High-Quality Quantum-Mechanical Calculations journal February 2018
Toward True DNA Base-Stacking Energies:  MP2, CCSD(T), and Complete Basis Set Calculations journal October 2002
Hierarchical modeling of molecular energies using a deep neural network journal June 2018
Discovering a Transferable Charge Assignment Model Using Machine Learning journal July 2018
Fast and Accurate Modeling of Molecular Atomization Energies with Machine Learning journal January 2012
The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for molecules collection January 2020
The Pilot Way to Grid Resources Using glideinWMS conference March 2009
The TensorMol-0.1 model chemistry: a neural network augmented with long-range physics journal January 2018
Alchemical and structural distribution based representation for universal quantum machine learning journal June 2018
Data-Driven Learning of Total and Local Energies in Elemental Boron journal April 2018
Transferable Dynamic Molecular Charge Assignment Using Deep Neural Networks journal July 2018
Learning molecular energies using localized graph kernels journal March 2017
Approaching coupled cluster accuracy with a general-purpose neural network potential through transfer learning journal July 2019
Open Babel: An open chemical toolbox journal October 2011
Active-learning strategies in computer-assisted drug discovery journal April 2015
Revisiting the Atomic Natural Orbital Approach for Basis Sets: Robust Systematic Basis Sets for Explicitly Correlated and Conventional Correlated ab initio Methods? journal December 2010
Communication: An improved linear scaling perturbative triples correction for the domain based local pair-natural orbital based singles and doubles coupled cluster method [DLPNO-CCSD(T)] journal January 2018
Enumeration of 166 Billion Organic Small Molecules in the Chemical Universe Database GDB-17 journal November 2012
Metadynamics for training neural network model chemistries: A competitive assessment journal June 2018
Active Learning journal June 2012
A Comparison of Quantum and Molecular Mechanical Methods to Estimate Strain Energy in Druglike Fragments journal May 2017
Gaussian Approximation Potentials: The Accuracy of Quantum Mechanics, without the Electrons journal April 2010
Quantum-chemical insights from deep tensor neural networks journal January 2017
Predicting Molecular Energy Using Force-Field Optimized Geometries and Atomic Vector Representations Learned from an Improved Deep Tensor Neural Network journal May 2019
MyChEMBL: A Virtual Platform for Distributing Cheminformatics Tools and Open Data journal September 2014
Machine learning of molecular properties: Locality and active learning journal June 2018
Compressing physics with an autoencoder: Creating an atomic species representation to improve machine learning models in the chemical sciences journal August 2019
Machine-learning-assisted materials discovery using failed experiments journal May 2016
Basis-set convergence of correlated calculations on water journal June 1997
Quantum chemistry structures and properties of 134 kilo molecules journal August 2014
SIMPLE-NN: An efficient package for training and executing neural-network interatomic potentials journal September 2019
Active learning of linearly parametrized interatomic potentials journal December 2017