skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: QM7-X, a comprehensive dataset of quantum-mechanical properties spanning the chemical space of small organic molecules

Journal Article · · Scientific Data

We introduce QM7-X, a comprehensive dataset of 42 physicochemical properties for ≈4.2 million equilibrium and non-equilibrium structures of small organic molecules with up to seven non-hydrogen (C, N, O, S, Cl) atoms. To span this fundamentally important region of chemical compound space (CCS), QM7-X includes an exhaustive sampling of (meta-)stable equilibrium structures—comprised of constitutional/structural isomers and stereoisomers, e.g., enantiomers and diastereomers (including cis-/trans- and conformational isomers)—as well as 100 non-equilibrium structural variations thereof to reach a total of ≈4.2 million molecular structures. Computed at the tightly converged quantum-mechanical PBE0+MBD level of theory, QM7-X contains global (molecular) and local (atom-in-a-molecule) properties ranging from ground state quantities (such as atomization energies and dipole moments) to response quantities (such as polarizability tensors and dispersion coefficients). By providing a systematic, extensive, and tightly-converged dataset of quantum-mechanically computed physicochemical properties, we expect that QM7-X will play a critical role in the development of next-generation machine-learning based models for exploring greater swaths of CCS and performing in silico design of molecules with targeted properties.

Research Organization:
Argonne National Laboratory (ANL), Argonne, IL (United States)
Sponsoring Organization:
European Research Council (ERC); USDOE Office of Science (SC), Basic Energy Sciences (BES). Scientific User Facilities Division
Grant/Contract Number:
AC02-06CH11357
OSTI ID:
1777928
Journal Information:
Scientific Data, Vol. 8, Issue 1; ISSN 2052-4463
Publisher:
Nature Publishing GroupCopyright Statement
Country of Publication:
United States
Language:
English

References (61)

Confab - Systematic generation of diverse low-energy conformers journal March 2011
970 Million Druglike Small Molecules for Virtual Screening in the Chemical Universe Database GDB-13 journal July 2009
Systematic optimization of long-range corrected hybrid density functionals journal February 2008
Towards exact molecular dynamics simulations with machine-learned force fields journal September 2018
Machine Learning Molecular Dynamics for the Simulation of Infrared Spectra text January 2017
Merck molecular force field. III. Molecular geometries and vibrational frequencies for MMFF94 journal April 1996
DFTB+, a Sparse Matrix-Based Implementation of the DFTB Method journal July 2007
Virtual Exploration of the Small-Molecule Chemical Universe below 160 Daltons journal February 2005
Machine Learning Predictions of Molecular Properties: Accurate Many-Body Potentials and Nonlocality in Chemical Space journal June 2015
ANI-1, A data set of 20 million calculated off-equilibrium conformations for organic molecules journal December 2017
Structure-based sampling and self-correcting machine learning for accurate calculations of potential energy surfaces and vibrational levels journal June 2017
Self-consistent-charge density-functional tight-binding method for simulations of complex materials properties journal September 1998
Communication: Charge-population based dispersion interactions for molecules and materials journal April 2016
Machine learning of molecular electronic properties in chemical compound space journal September 2013
Parameterization of DFTB3/3OB for Sulfur and Phosphorus for Chemical and Biological Applications journal March 2014
SchNet – A deep learning architecture for molecules and materials journal June 2018
Virtual Exploration of the Chemical Universe up to 11 Atoms of C, N, O, F:  Assembly of 26.4 Million Structures (110.9 Million Stereoisomers) and Analysis for New Ring Systems, Stereochemistry, Physicochemical Properties, Compound Classes, and Drug Discovery journal January 2007
Parametrization and Benchmark of DFTB3 for Organic Molecules journal November 2012
QM7-X: A comprehensive dataset of quantum-mechanical properties spanning the chemical space of small organic molecules dataset January 2020
Perspective: Machine learning potentials for atomistic simulations journal November 2016
Approaches for machine learning intermolecular interaction energies and application to energy components from symmetry adapted perturbation theory journal February 2020
Merck molecular force field. I. Basis, form, scope, parameterization, and performance of MMFF94 journal April 1996
Robust and Affordable Multicoefficient Methods for Thermochemistry and Thermochemical Kinetics:  The MCCM/3 Suite and SAC/3 journal May 2003
Merck molecular force field. IV. conformational energies and geometries for MMFF94 journal April 1996
Non-covalent interactions across organic and biological subsets of chemical space: Physics-based potentials parametrized from machine learning journal June 2018
Neural network potential-energy surfaces in chemistry: a tool for large-scale simulations journal January 2011
Long-range correlation energy calculated from coupled atomic response functions journal May 2014
Quantum Machine Learning in Chemical Compound Space journal March 2018
Structure and Stability of Molecular Crystals with Many-Body Dispersion-Inclusive Density Functional Tight Binding journal January 2018
Open Babel: An open chemical toolbox journal October 2011
Removing External Degrees of Freedom from Transition-State Search Methods using Quaternions journal February 2015
The atomic simulation environment—a Python library for working with atoms journal June 2017
Machine learning unifies the modeling of materials and molecules journal December 2017
The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for molecules journal May 2020
Resolution-of-identity approach to Hartree–Fock, hybrid density functionals, RPA, MP2 and GW with numeric atom-centered orbital basis functions journal May 2012
Merck molecular force field. V. Extension of MMFF94 using experimental data, additional computational data, and empirical rules journal April 1996
Toward reliable density functional methods without adjustable parameters: The PBE0 model journal April 1999
Assessment of the Perdew–Burke–Ernzerhof exchange-correlation functional journal March 1999
Enumeration of 166 Billion Organic Small Molecules in the Chemical Universe Database GDB-17 journal November 2012
Merck molecular force field. II. MMFF94 van der Waals and electrostatic parameters for intermolecular interactions journal April 1996
Accurate and Efficient Method for Many-Body van der Waals Interactions journal June 2012
Machine Learning Predictions of Molecular Properties: Accurate Many-Body Potentials and Nonlocality in Chemical Space text January 2015
DFTB+, a software package for efficient approximate density functional theory based atomistic simulations journal March 2020
QM7-X: A comprehensive dataset of quantum-mechanical properties spanning the chemical space of small organic molecules dataset January 2020
DFTB3: Extension of the Self-Consistent-Charge Density-Functional Tight-Binding Method (SCC-DFTB) journal March 2011
Exploring Chemical Space for Drug Discovery Using the Chemical Universe Database journal May 2012
Ab initio molecular simulations with numeric atom-centered orbitals journal November 2009
Quantum-chemical insights from deep tensor neural networks journal January 2017
Design of efficient molecular organic light-emitting diodes by a high-throughput virtual screening and experimental approach journal August 2016
Understanding the role of vibrations, exact exchange, and many-body van der Waals interactions in the cohesive properties of molecular crystals journal July 2013
The Theory of Intermolecular Forces book January 2013
Accurate Many-Body Repulsive Potentials for Density-Functional Tight Binding from Deep Tensor Neural Networks journal July 2020
Quantum mechanical static dipole polarizabilities in the QM7b and AlphaML showcase databases journal August 2019
Machine learning molecular dynamics for the simulation of infrared spectra journal January 2017
Rationale for mixing exact exchange with density functional approximations journal December 1996
Quantum chemistry structures and properties of 134 kilo molecules journal August 2014
Efficient integration for all-electron electronic structure calculation using numeric basis functions journal December 2009
Operators in quantum machine learning: Response properties in chemical space journal February 2019
Efficient nonparametric n -body force fields from machine learning journal May 2018
Reliable and practical computational description of molecular crystal polymorphs journal January 2019
Exploring chemical compound space with quantum-based machine learning journal June 2020