DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: The MolSSI QCArchive project: An open-source platform to compute, organize, and share quantum chemistry data

Abstract

Abstract The Molecular Sciences Software Institute's (MolSSI) Quantum Chemistry Archive (QCA rchive ) project is an umbrella name that covers both a central server hosted by MolSSI for community data and the Python‐based software infrastructure that powers automated computation and storage of quantum chemistry (QC) results. The MolSSI‐hosted central server provides the computational molecular sciences community a location to freely access tens of millions of QC computations for machine learning, methodology assessment, force‐field fitting, and more through a Python interface. Facile, user‐friendly mining of the centrally archived quantum chemical data also can be achieved through web applications found at https://qcarchive.molssi.org . The software infrastructure can be used as a standalone platform to compute, structure, and distribute hundreds of millions of QC computations for individuals or groups of researchers at any scale. The QCA rchive I nfrastructure is open‐source (BSD‐3C), code repositories can be found at https://github.com/MolSSI , and releases can be downloaded via PyPI and Conda. This article is categorized under: Electronic Structure Theory > Ab Initio Electronic Structure Methods Software > Quantum Chemistry Data Science > Computer Algorithms and Programming

Authors:
ORCiD logo [1];  [2];  [3];  [1]; ORCiD logo [1];  [4];  [1];  [1]; ORCiD logo [5]
  1. Molecular Sciences Software Inst., Blacksburg, VA (United States)
  2. Molecular Sciences Software Inst., Blacksburg, VA (United States); Alexandria Univ. (Egypt)
  3. Georgia Institute of Technology, Atlanta, GA (United States). Center for Computational Molecular Science and Technology
  4. Argonne National Lab. (ANL), Lemont, IL (United States)
  5. Molecular Sciences Software Inst., Blacksburg, VA (United States); Virginia Polytechnic Inst. and State Univ. (Virginia Tech), Blacksburg, VA (United States)
Publication Date:
Research Org.:
Argonne National Laboratory (ANL), Argonne, IL (United States)
Sponsoring Org.:
National Science Foundation (NSF); USDOE Exascale Computing Project; USDOE Office of Science (SC); USDOE National Nuclear Security Administration (NNSA)
OSTI Identifier:
1774122
Alternate Identifier(s):
OSTI ID: 1644230
Grant/Contract Number:  
AC02-06CH11357; 1449723; 1547580; 17‐SC‐20‐SC
Resource Type:
Accepted Manuscript
Journal Name:
Wiley Interdisciplinary Reviews: Computational Molecular Science
Additional Journal Information:
Journal Volume: 11; Journal Issue: 2; Journal ID: ISSN 1759-0876
Publisher:
Wiley
Country of Publication:
United States
Language:
English
Subject:
37 INORGANIC, ORGANIC, PHYSICAL, AND ANALYTICAL CHEMISTRY; databases; density functional theory; high-throughput computing; machine learning; quantum chemistry

Citation Formats

Smith, Daniel A., Altarawy, Doaa, Burns, Lori A., Welborn, Matthew, Naden, Levi N., Ward, Logan, Ellis, Sam, Pritchard, Benjamin P., and Crawford, T. Daniel. The MolSSI QCArchive project: An open-source platform to compute, organize, and share quantum chemistry data. United States: N. p., 2020. Web. doi:10.1002/wcms.1491.
Smith, Daniel A., Altarawy, Doaa, Burns, Lori A., Welborn, Matthew, Naden, Levi N., Ward, Logan, Ellis, Sam, Pritchard, Benjamin P., & Crawford, T. Daniel. The MolSSI QCArchive project: An open-source platform to compute, organize, and share quantum chemistry data. United States. https://doi.org/10.1002/wcms.1491
Smith, Daniel A., Altarawy, Doaa, Burns, Lori A., Welborn, Matthew, Naden, Levi N., Ward, Logan, Ellis, Sam, Pritchard, Benjamin P., and Crawford, T. Daniel. Fri . "The MolSSI QCArchive project: An open-source platform to compute, organize, and share quantum chemistry data". United States. https://doi.org/10.1002/wcms.1491. https://www.osti.gov/servlets/purl/1774122.
@article{osti_1774122,
title = {The MolSSI QCArchive project: An open-source platform to compute, organize, and share quantum chemistry data},
author = {Smith, Daniel A. and Altarawy, Doaa and Burns, Lori A. and Welborn, Matthew and Naden, Levi N. and Ward, Logan and Ellis, Sam and Pritchard, Benjamin P. and Crawford, T. Daniel},
abstractNote = {Abstract The Molecular Sciences Software Institute's (MolSSI) Quantum Chemistry Archive (QCA rchive ) project is an umbrella name that covers both a central server hosted by MolSSI for community data and the Python‐based software infrastructure that powers automated computation and storage of quantum chemistry (QC) results. The MolSSI‐hosted central server provides the computational molecular sciences community a location to freely access tens of millions of QC computations for machine learning, methodology assessment, force‐field fitting, and more through a Python interface. Facile, user‐friendly mining of the centrally archived quantum chemical data also can be achieved through web applications found at https://qcarchive.molssi.org . The software infrastructure can be used as a standalone platform to compute, structure, and distribute hundreds of millions of QC computations for individuals or groups of researchers at any scale. The QCA rchive I nfrastructure is open‐source (BSD‐3C), code repositories can be found at https://github.com/MolSSI , and releases can be downloaded via PyPI and Conda. This article is categorized under: Electronic Structure Theory > Ab Initio Electronic Structure Methods Software > Quantum Chemistry Data Science > Computer Algorithms and Programming},
doi = {10.1002/wcms.1491},
journal = {Wiley Interdisciplinary Reviews: Computational Molecular Science},
number = 2,
volume = 11,
place = {United States},
year = {Fri Jul 31 00:00:00 EDT 2020},
month = {Fri Jul 31 00:00:00 EDT 2020}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 39 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

Geometry optimization made simple with translation and rotation coordinates
journal, June 2016

  • Wang, Lee-Ping; Song, Chenchen
  • The Journal of Chemical Physics, Vol. 144, Issue 21
  • DOI: 10.1063/1.4952956

Recent developments in the general atomic and molecular electronic structure system
journal, April 2020

  • Barca, Giuseppe M. J.; Bertoni, Colleen; Carrington, Laura
  • The Journal of Chemical Physics, Vol. 152, Issue 15
  • DOI: 10.1063/5.0005188

Balanced basis sets of split valence, triple zeta valence and quadruple zeta valence quality for H to Rn: Design and assessment of accuracy
journal, January 2005

  • Weigend, Florian; Ahlrichs, Reinhart
  • Physical Chemistry Chemical Physics, Vol. 7, Issue 18, p. 3297-3305
  • DOI: 10.1039/b508541a

Psi4 1.1: An Open-Source Electronic Structure Program Emphasizing Automation, Advanced Libraries, and Interoperability
journal, June 2017

  • Parrish, Robert M.; Burns, Lori A.; Smith, Daniel G. A.
  • Journal of Chemical Theory and Computation, Vol. 13, Issue 7
  • DOI: 10.1021/acs.jctc.7b00174

Density‐functional thermochemistry. III. The role of exact exchange
journal, April 1993

  • Becke, Axel D.
  • The Journal of Chemical Physics, Vol. 98, Issue 7, p. 5648-5652
  • DOI: 10.1063/1.464913

Turbomole
journal, July 2013

  • Furche, Filipp; Ahlrichs, Reinhart; Hättig, Christof
  • Wiley Interdisciplinary Reviews: Computational Molecular Science, Vol. 4, Issue 2
  • DOI: 10.1002/wcms.1162

Accurate Noncovalent Interactions via Dispersion-Corrected Second-Order Møller–Plesset Perturbation Theory
journal, August 2018

  • Řezáč, Jan; Greenwell, Chandler; Beran, Gregory J. O.
  • Journal of Chemical Theory and Computation, Vol. 14, Issue 9
  • DOI: 10.1021/acs.jctc.8b00548

ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost
journal, January 2017

  • Smith, J. S.; Isayev, O.; Roitberg, A. E.
  • Chemical Science, Vol. 8, Issue 4
  • DOI: 10.1039/C6SC05720A

Development of the Colle-Salvetti correlation-energy formula into a functional of the electron density
journal, January 1988


Making the most of materials computations
journal, October 2016


NWChem: Past, present, and future
journal, May 2020

  • Aprà, E.; Bylaska, E. J.; de Jong, W. A.
  • The Journal of Chemical Physics, Vol. 152, Issue 18
  • DOI: 10.1063/5.0004997

NGLview–interactive molecular graphics for Jupyter notebooks
journal, December 2017


Data Structures for Statistical Computing in Python
conference, January 2010


Molpro: a general-purpose quantum chemistry program package: Molpro
journal, July 2011

  • Werner, Hans-Joachim; Knowles, Peter J.; Knizia, Gerald
  • Wiley Interdisciplinary Reviews: Computational Molecular Science, Vol. 2, Issue 2
  • DOI: 10.1002/wcms.82

ACCDB: A collection of chemistry databases for broad computational purposes: ACCDB: A Collection of Chemistry DataBases for Broad Computational Purposes
journal, December 2018

  • Morgante, Pierpaolo; Peverati, Roberto
  • Journal of Computational Chemistry, Vol. 40, Issue 6
  • DOI: 10.1002/jcc.25761

Quantum Chemistry on Graphical Processing Units. 3. Analytical Energy Gradients, Geometry Optimization, and First Principles Molecular Dynamics
journal, August 2009

  • Ufimtsev, Ivan S.; Martinez, Todd J.
  • Journal of Chemical Theory and Computation, Vol. 5, Issue 10
  • DOI: 10.1021/ct9003004

FireWorks: a dynamic workflow system designed for high-throughput applications: FireWorks: A Dynamic Workflow System Designed for High-Throughput Applications
journal, May 2015

  • Jain, Anubhav; Ong, Shyue Ping; Chen, Wei
  • Concurrency and Computation: Practice and Experience, Vol. 27, Issue 17
  • DOI: 10.1002/cpe.3505

NOMAD: The FAIR concept for big data-driven materials science
journal, September 2018


Benchmark database of accurate (MP2 and CCSD(T) complete basis set limit) interaction energies of small model complexes, DNA base pairs, and amino acid pairs
journal, January 2006

  • Jurečka, Petr; Šponer, Jiří; Černý, Jiří
  • Physical Chemistry Chemical Physics, Vol. 8, Issue 17, p. 1985-1993
  • DOI: 10.1039/B600027D

Dask: Parallel Computation with Blocked algorithms and Task Scheduling
conference, January 2015


ANI-1, A data set of 20 million calculated off-equilibrium conformations for organic molecules
journal, December 2017

  • Smith, Justin S.; Isayev, Olexandr; Roitberg, Adrian E.
  • Scientific Data, Vol. 4, Issue 1
  • DOI: 10.1038/sdata.2017.193

The Science Gateways Community Institute at Two Years
conference, July 2018

  • Wilkins-Diehr, Nancy; Zentner, Michael; Pierce, Marlon
  • PEARC '18: Practice and Experience in Advanced Research Computing, Proceedings of the Practice and Experience on Advanced Research Computing
  • DOI: 10.1145/3219104.3219142

Driving torsion scans with wavefront propagation
journal, June 2020

  • Qiu, Yudong; Smith, Daniel G. A.; Stern, Chaya D.
  • The Journal of Chemical Physics, Vol. 152, Issue 24
  • DOI: 10.1063/5.0009232

Less is more: Sampling chemical space with active learning
journal, June 2018

  • Smith, Justin S.; Nebgen, Ben; Lubbers, Nicholas
  • The Journal of Chemical Physics, Vol. 148, Issue 24
  • DOI: 10.1063/1.5023802

Commentary: The Materials Project: A materials genome approach to accelerating materials innovation
journal, July 2013

  • Jain, Anubhav; Ong, Shyue Ping; Hautier, Geoffroy
  • APL Materials, Vol. 1, Issue 1
  • DOI: 10.1063/1.4812323

The FAIR Guiding Principles for scientific data management and stewardship
journal, March 2016

  • Wilkinson, Mark D.; Dumontier, Michel; Aalbersberg, IJsbrand Jan
  • Scientific Data, Vol. 3, Issue 1
  • DOI: 10.1038/sdata.2016.18

P si4 1.4: Open-source software for high-throughput quantum chemistry
journal, May 2020

  • Smith, Daniel G. A.; Burns, Lori A.; Simmonett, Andrew C.
  • The Journal of Chemical Physics, Vol. 152, Issue 18
  • DOI: 10.1063/5.0006002

Benchmark Database of Barrier Heights for Heavy Atom Transfer, Nucleophilic Substitution, Association, and Unimolecular Reactions and Its Use to Test Theoretical Methods
journal, March 2005

  • Zhao, Yan; González-García, Núria; Truhlar, Donald G.
  • The Journal of Physical Chemistry A, Vol. 109, Issue 9
  • DOI: 10.1021/jp045141s

The BioFragment Database (BFDb): An open-data platform for computational chemistry analysis of noncovalent interactions
journal, October 2017

  • Burns, Lori A.; Faver, John C.; Zheng, Zheng
  • The Journal of Chemical Physics, Vol. 147, Issue 16
  • DOI: 10.1063/1.5001028

Binder 2.0 - Reproducible, interactive, sharable environments for science at scale
conference, January 2018


Parsl: Pervasive Parallel Programming in Python
conference, January 2019

  • Babuji, Yadu; Foster, Ian; Wilde, Michael
  • Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing - HPDC '19
  • DOI: 10.1145/3307681.3325400

Building Blocks for Workflow System Middleware
conference, May 2018

  • Turilli, Matteo; Merzky, Andre; Balasubramanian, Vivek
  • 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)
  • DOI: 10.1109/CCGRID.2018.00051

Quantum-chemical insights from deep tensor neural networks
journal, January 2017

  • Schütt, Kristof T.; Arbabzadah, Farhad; Chmiela, Stefan
  • Nature Communications, Vol. 8, Issue 1
  • DOI: 10.1038/ncomms13890

PubChemQC Project: A Large-Scale First-Principles Electronic Structure Database for Data-Driven Chemistry
journal, May 2017

  • Nakata, Maho; Shimazaki, Tomomi
  • Journal of Chemical Information and Modeling, Vol. 57, Issue 6
  • DOI: 10.1021/acs.jcim.7b00083

Advances in molecular quantum chemistry contained in the Q-Chem 4 program package
journal, September 2014


Dataset’s chemical diversity limits the generalizability of machine learning predictions
journal, November 2019

  • Glavatskikh, Marta; Leguy, Jules; Hunault, Gilles
  • Journal of Cheminformatics, Vol. 11, Issue 1
  • DOI: 10.1186/s13321-019-0391-2

Performance of Ab Initio and Density Functional Methods for Conformational Equilibria of C n H 2 n +2 Alkane Isomers ( n = 4−8)
journal, October 2009

  • Gruzman, David; Karton, Amir; Martin, Jan M. L.
  • The Journal of Physical Chemistry A, Vol. 113, Issue 43
  • DOI: 10.1021/jp903640h

PubChem 2019 update: improved access to chemical data
journal, October 2018

  • Kim, Sunghwan; Chen, Jie; Cheng, Tiejun
  • Nucleic Acids Research, Vol. 47, Issue D1
  • DOI: 10.1093/nar/gky1033

Quantum chemistry structures and properties of 134 kilo molecules
journal, August 2014

  • Ramakrishnan, Raghunathan; Dral, Pavlo O.; Rupp, Matthias
  • Scientific Data, Vol. 1, Issue 1
  • DOI: 10.1038/sdata.2014.22

SLURM: Simple Linux Utility for Resource Management
book, January 2003

  • Yoo, Andy B.; Jette, Morris A.; Grondona, Mark
  • Job Scheduling Strategies for Parallel Processing
  • DOI: 10.1007/10968987_3

Formal Estimation of Errors in Computed Absolute Interaction Energies of Protein−Ligand Complexes
journal, February 2011

  • Faver, John C.; Benson, Mark L.; He, Xiao
  • Journal of Chemical Theory and Computation, Vol. 7, Issue 3
  • DOI: 10.1021/ct100563b

OpenMM 7: Rapid development of high performance algorithms for molecular dynamics
journal, July 2017


A consistent and accurate ab initio parametrization of density functional dispersion correction (DFT-D) for the 94 elements H-Pu
journal, April 2010

  • Grimme, Stefan; Antony, Jens; Ehrlich, Stephan
  • The Journal of Chemical Physics, Vol. 132, Issue 15
  • DOI: 10.1063/1.3382344