skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Deep learning for computational chemistry

Abstract

The rise and fall of artificial neural networks is well documented in the scientific literature of both the fields of computer science and computational chemistry. Yet almost two decades later, we are now seeing a resurgence of interest in deep learning, a machine learning algorithm based on “deep” neural networks. Within the last few years, we have seen the transformative impact of deep learning the computer science domain, notably in speech recognition and computer vision, to the extent that the majority of practitioners in those field are now regularly eschewing prior established models in favor of deep learning models. In this review, we provide an introductory overview into the theory of deep neural networks and their unique properties as compared to traditional machine learning algorithms used in cheminformatics. By providing an overview of the variety of emerging applications of deep neural networks, we highlight its ubiquity and broad applicability to a wide range of challenges in the field, including QSAR, virtual screening, protein structure modeling, QM calculations, materials synthesis and property prediction. In reviewing the performance of deep neural networks, we observed a consistent outperformance against non neural networks state-of-the-art models across disparate research topics, and deep neural network basedmore » models often exceeded the “glass ceiling” expectations of their respective tasks. Coupled with the maturity of GPU-accelerated computing for training deep neural networks and the exponential growth of chemical data on which to train these networks on, we anticipate that deep learning algorithms will be a useful tool and may grow into a pivotal role for various challenges in the computational chemistry field.« less

Authors:
ORCiD logo [1];  [1];  [1]
  1. Advanced Computing, Mathematics, and Data Division, Pacific Northwest National Laboratory, 902 Battelle Blvd Richland Washington 99354
Publication Date:
Research Org.:
Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1406688
Report Number(s):
PNNL-SA-121040
Journal ID: ISSN 0192-8651
DOE Contract Number:  
AC05-76RL01830
Resource Type:
Journal Article
Journal Name:
Journal of Computational Chemistry
Additional Journal Information:
Journal Volume: 38; Journal Issue: 16; Journal ID: ISSN 0192-8651
Publisher:
Wiley
Country of Publication:
United States
Language:
English
Subject:
Deep Learning; Computational Chemistry; Materials Genome; Quantum Chemistry; molecular modeling

Citation Formats

Goh, Garrett B., Hodas, Nathan O., and Vishnu, Abhinav. Deep learning for computational chemistry. United States: N. p., 2017. Web. doi:10.1002/jcc.24764.
Goh, Garrett B., Hodas, Nathan O., & Vishnu, Abhinav. Deep learning for computational chemistry. United States. doi:10.1002/jcc.24764.
Goh, Garrett B., Hodas, Nathan O., and Vishnu, Abhinav. Wed . "Deep learning for computational chemistry". United States. doi:10.1002/jcc.24764.
@article{osti_1406688,
title = {Deep learning for computational chemistry},
author = {Goh, Garrett B. and Hodas, Nathan O. and Vishnu, Abhinav},
abstractNote = {The rise and fall of artificial neural networks is well documented in the scientific literature of both the fields of computer science and computational chemistry. Yet almost two decades later, we are now seeing a resurgence of interest in deep learning, a machine learning algorithm based on “deep” neural networks. Within the last few years, we have seen the transformative impact of deep learning the computer science domain, notably in speech recognition and computer vision, to the extent that the majority of practitioners in those field are now regularly eschewing prior established models in favor of deep learning models. In this review, we provide an introductory overview into the theory of deep neural networks and their unique properties as compared to traditional machine learning algorithms used in cheminformatics. By providing an overview of the variety of emerging applications of deep neural networks, we highlight its ubiquity and broad applicability to a wide range of challenges in the field, including QSAR, virtual screening, protein structure modeling, QM calculations, materials synthesis and property prediction. In reviewing the performance of deep neural networks, we observed a consistent outperformance against non neural networks state-of-the-art models across disparate research topics, and deep neural network based models often exceeded the “glass ceiling” expectations of their respective tasks. Coupled with the maturity of GPU-accelerated computing for training deep neural networks and the exponential growth of chemical data on which to train these networks on, we anticipate that deep learning algorithms will be a useful tool and may grow into a pivotal role for various challenges in the computational chemistry field.},
doi = {10.1002/jcc.24764},
journal = {Journal of Computational Chemistry},
issn = {0192-8651},
number = 16,
volume = 38,
place = {United States},
year = {2017},
month = {3}
}

Works referenced in this record:

PISCES: recent improvements to a PDB sequence culling server
journal, July 2005

  • Wang, G.; Dunbrack, R. L.
  • Nucleic Acids Research, Vol. 33, Issue Web Server
  • DOI: 10.1093/nar/gki402

Deep learning
journal, May 2015

  • LeCun, Yann; Bengio, Yoshua; Hinton, Geoffrey
  • Nature, Vol. 521, Issue 7553
  • DOI: 10.1038/nature14539

CHARMM36 all-atom additive protein force field: Validation based on comparison to NMR data
journal, July 2013

  • Huang, Jing; MacKerell, Alexander D.
  • Journal of Computational Chemistry, Vol. 34, Issue 25
  • DOI: 10.1002/jcc.23354

Mastering the game of Go with deep neural networks and tree search
journal, January 2016

  • Silver, David; Huang, Aja; Maddison, Chris J.
  • Nature, Vol. 529, Issue 7587
  • DOI: 10.1038/nature16961

GROMACS 4:  Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation
journal, February 2008

  • Hess, Berk; Kutzner, Carsten; van der Spoel, David
  • Journal of Chemical Theory and Computation, Vol. 4, Issue 3
  • DOI: 10.1021/ct700301q

A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction
journal, June 2005


Statistical potential for assessment and prediction of protein structures
journal, November 2006


How Fast-Folding Proteins Fold
journal, October 2011


Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
journal, September 1997

  • Altschul, Stephen F.; Madden, Thomas L.; Schäffer, Alejandro A.
  • Nucleic Acids Research, Vol. 25, Issue 17, p. 3389-3402
  • DOI: 10.1093/nar/25.17.3389

Machine learning of molecular electronic properties in chemical compound space
journal, September 2013


A Kirkwood-Buff Derived Force Field for Aqueous Alkali Halides
journal, April 2011

  • Gee, Moon Bae; Cox, Nicholas R.; Jiao, Yuanfang
  • Journal of Chemical Theory and Computation, Vol. 7, Issue 5
  • DOI: 10.1021/ct100517z

Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning
journal, July 2015

  • Alipanahi, Babak; Delong, Andrew; Weirauch, Matthew T.
  • Nature Biotechnology, Vol. 33, Issue 8
  • DOI: 10.1038/nbt.3300

Machine Learning of Parameters for Accurate Semiempirical Quantum Chemical Calculations
journal, April 2015

  • Dral, Pavlo O.; von Lilienfeld, O. Anatole; Thiel, Walter
  • Journal of Chemical Theory and Computation, Vol. 11, Issue 5
  • DOI: 10.1021/acs.jctc.5b00141

CHARMM: The biomolecular simulation program
journal, July 2009

  • Brooks, B. R.; Brooks, C. L.; Mackerell, A. D.
  • Journal of Computational Chemistry, Vol. 30, Issue 10
  • DOI: 10.1002/jcc.21287

Quantitative Nanostructure−Activity Relationship Modeling
journal, September 2010

  • Fourches, Denis; Pu, Dongqiuye; Tassa, Carlos
  • ACS Nano, Vol. 4, Issue 10
  • DOI: 10.1021/nn1013484

Deep Architectures and Deep Learning in Chemoinformatics: The Prediction of Aqueous Solubility for Drug-Like Molecules
journal, July 2013

  • Lusci, Alessandro; Pollastri, Gianluca; Baldi, Pierre
  • Journal of Chemical Information and Modeling, Vol. 53, Issue 7
  • DOI: 10.1021/ci400187y

A logical calculus of the ideas immanent in nervous activity
journal, January 1990

  • McCulloch, Warren S.; Pitts, Walter
  • Bulletin of Mathematical Biology, Vol. 52, Issue 1-2
  • DOI: 10.1007/BF02459570

Deep Learning in Drug Discovery
journal, December 2015

  • Gawehn, Erik; Hiss, Jan A.; Schneider, Gisbert
  • Molecular Informatics, Vol. 35, Issue 1
  • DOI: 10.1002/minf.201501008

Random Forest:  A Classification and Regression Tool for Compound Classification and QSAR Modeling
journal, November 2003

  • Svetnik, Vladimir; Liaw, Andy; Tong, Christopher
  • Journal of Chemical Information and Computer Sciences, Vol. 43, Issue 6
  • DOI: 10.1021/ci034160g

Fast and Accurate Modeling of Molecular Atomization Energies with Machine Learning
journal, January 2012


Big Data Meets Quantum Chemistry Approximations: The Δ-Machine Learning Approach
journal, April 2015

  • Ramakrishnan, Raghunathan; Dral, Pavlo O.; Rupp, Matthias
  • Journal of Chemical Theory and Computation, Vol. 11, Issue 5
  • DOI: 10.1021/acs.jctc.5b00099

Quantitative Structure–Property Relationship Modeling of Diverse Materials Properties
journal, January 2012

  • Le, Tu; Epa, V. Chandana; Burden, Frank R.
  • Chemical Reviews, Vol. 112, Issue 5
  • DOI: 10.1021/cr200066h

Accelerating molecular modeling applications with graphics processors
journal, January 2007

  • Stone, John E.; Phillips, James C.; Freddolino, Peter L.
  • Journal of Computational Chemistry, Vol. 28, Issue 16
  • DOI: 10.1002/jcc.20829

DeepTox: Toxicity Prediction using Deep Learning
journal, February 2016

  • Mayr, Andreas; Klambauer, Günter; Unterthiner, Thomas
  • Frontiers in Environmental Science, Vol. 3
  • DOI: 10.3389/fenvs.2015.00080

Exploring Chemical Space for Drug Discovery Using the Chemical Universe Database
journal, May 2012

  • Reymond, Jean-Louis; Awale, Mahendra
  • ACS Chemical Neuroscience, Vol. 3, Issue 9
  • DOI: 10.1021/cn3000422

Stalking the Materials Genome: A Data-Driven Approach to the Virtual Design of Nanostructured Polymers
journal, June 2013

  • Breneman, Curt M.; Brinson, L. Catherine; Schadler, Linda S.
  • Advanced Functional Materials, Vol. 23, Issue 46
  • DOI: 10.1002/adfm.201301744

Searching for exotic particles in high-energy physics with deep learning
journal, July 2014

  • Baldi, P.; Sadowski, P.; Whiteson, D.
  • Nature Communications, Vol. 5, Issue 1
  • DOI: 10.1038/ncomms5308

Accelerating Density Functional Calculations with Graphics Processing Unit
journal, July 2008

  • Yasuda, Koji
  • Journal of Chemical Theory and Computation, Vol. 4, Issue 8
  • DOI: 10.1021/ct8001046

Finding Nature’s Missing Ternary Oxide Compounds Using Machine Learning and Density Functional Theory
journal, June 2010

  • Hautier, Geoffroy; Fischer, Christopher C.; Jain, Anubhav
  • Chemistry of Materials, Vol. 22, Issue 12
  • DOI: 10.1021/cm100795d

Representation Learning: A Review and New Perspectives
journal, August 2013

  • Bengio, Y.; Courville, A.; Vincent, P.
  • IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 35, Issue 8
  • DOI: 10.1109/TPAMI.2013.50

Predicting backbone Cα angles and dihedrals from protein sequences by stacked sparse auto-encoder deep neural network
journal, September 2014

  • Lyons, James; Dehzangi, Abdollah; Heffernan, Rhys
  • Journal of Computational Chemistry, Vol. 35, Issue 28
  • DOI: 10.1002/jcc.23718

Are Protein Force Fields Getting Better? A Systematic Benchmark on 524 Diverse NMR Measurements
journal, March 2012

  • Beauchamp, Kyle A.; Lin, Yu-Shan; Das, Rhiju
  • Journal of Chemical Theory and Computation, Vol. 8, Issue 4
  • DOI: 10.1021/ct2007814

Evaluation of methods for modeling transcription factor sequence specificity
journal, January 2013

  • Weirauch, Matthew T.; Cote, Atina; Norel, Raquel
  • Nature Biotechnology, Vol. 31, Issue 2
  • DOI: 10.1038/nbt.2486

Extended-Connectivity Fingerprints
journal, April 2010

  • Rogers, David; Hahn, Mathew
  • Journal of Chemical Information and Modeling, Vol. 50, Issue 5
  • DOI: 10.1021/ci100050t

Scalable molecular dynamics with NAMD
journal, January 2005

  • Phillips, James C.; Braun, Rosemary; Wang, Wei
  • Journal of Computational Chemistry, Vol. 26, Issue 16, p. 1781-1802
  • DOI: 10.1002/jcc.20289

The Amber biomolecular simulation programs
journal, January 2005

  • Case, David A.; Cheatham, Thomas E.; Darden, Tom
  • Journal of Computational Chemistry, Vol. 26, Issue 16
  • DOI: 10.1002/jcc.20290

Deep Neural Nets as a Method for Quantitative Structure–Activity Relationships
journal, February 2015

  • Ma, Junshui; Sheridan, Robert P.; Liaw, Andy
  • Journal of Chemical Information and Modeling, Vol. 55, Issue 2
  • DOI: 10.1021/ci500747n

Big–deep–smart data in imaging for guiding materials design
journal, September 2015

  • Kalinin, Sergei V.; Sumpter, Bobby G.; Archibald, Richard K.
  • Nature Materials, Vol. 14, Issue 10
  • DOI: 10.1038/nmat4395

ImageNet Large Scale Visual Recognition Challenge
journal, April 2015

  • Russakovsky, Olga; Deng, Jia; Su, Hao
  • International Journal of Computer Vision, Vol. 115, Issue 3
  • DOI: 10.1007/s11263-015-0816-y

Mold 2 , Molecular Descriptors from 2D Structures for Chemoinformatics and Toxicoinformatics
journal, June 2008

  • Hong, Huixiao; Xie, Qian; Ge, Weigong
  • Journal of Chemical Information and Modeling, Vol. 48, Issue 7
  • DOI: 10.1021/ci800038f

Framewise phoneme classification with bidirectional LSTM and other neural network architectures
journal, July 2005


Directory of Useful Decoys, Enhanced (DUD-E): Better Ligands and Decoys for Better Benchmarking
journal, July 2012

  • Mysinger, Michael M.; Carchia, Michael; Irwin, John. J.
  • Journal of Medicinal Chemistry, Vol. 55, Issue 14
  • DOI: 10.1021/jm300687e

Fourier series of atomic radial distribution functions: A molecular fingerprint for machine learning models of quantum chemical properties
journal, April 2015

  • von Lilienfeld, O. Anatole; Ramakrishnan, Raghunathan; Rupp, Matthias
  • International Journal of Quantum Chemistry, Vol. 115, Issue 16
  • DOI: 10.1002/qua.24912

NWChem: A comprehensive and scalable open-source solution for large scale molecular simulations
journal, September 2010

  • Valiev, M.; Bylaska, E. J.; Govind, N.
  • Computer Physics Communications, Vol. 181, Issue 9, p. 1477-1489
  • DOI: 10.1016/j.cpc.2010.04.018

Improving prediction of secondary structure, local backbone angles and solvent accessible surface area of proteins by iterative deep learning
journal, June 2015

  • Heffernan, Rhys; Paliwal, Kuldip; Lyons, James
  • Scientific Reports, Vol. 5, Issue 1
  • DOI: 10.1038/srep11476

Learning from the Harvard Clean Energy Project: The Use of Neural Networks to Accelerate Materials Discovery
journal, September 2015

  • Pyzer-Knapp, Edward O.; Li, Kewei; Aspuru-Guzik, Alan
  • Advanced Functional Materials, Vol. 25, Issue 41
  • DOI: 10.1002/adfm.201501919

Deep Blue
journal, January 2002


Learning representations by back-propagating errors
journal, October 1986

  • Rumelhart, David E.; Hinton, Geoffrey E.; Williams, Ronald J.
  • Nature, Vol. 323, Issue 6088
  • DOI: 10.1038/323533a0

General atomic and molecular electronic structure system
journal, November 1993

  • Schmidt, Michael W.; Baldridge, Kim K.; Boatz, Jerry A.
  • Journal of Computational Chemistry, Vol. 14, Issue 11, p. 1347-1363
  • DOI: 10.1002/jcc.540141112

Modeling electronic quantum transport with machine learning
journal, June 2014


Crystal structure representations for machine learning models of formation energies
journal, April 2015

  • Faber, Felix; Lindmaa, Alexander; von Lilienfeld, O. Anatole
  • International Journal of Quantum Chemistry, Vol. 115, Issue 16
  • DOI: 10.1002/qua.24917

Protein structure prediction from sequence variation
journal, November 2012

  • Marks, Debora S.; Hopf, Thomas A.; Sander, Chris
  • Nature Biotechnology, Vol. 30, Issue 11
  • DOI: 10.1038/nbt.2419

Commentary: The Materials Project: A materials genome approach to accelerating materials innovation
journal, July 2013

  • Jain, Anubhav; Ong, Shyue Ping; Hautier, Geoffroy
  • APL Materials, Vol. 1, Issue 1
  • DOI: 10.1063/1.4812323

Assessment and Validation of Machine Learning Methods for Predicting Molecular Atomization Energies
journal, July 2013

  • Hansen, Katja; Montavon, Grégoire; Biegler, Franziska
  • Journal of Chemical Theory and Computation, Vol. 9, Issue 8
  • DOI: 10.1021/ct400195d

A Fast Learning Algorithm for Deep Belief Nets
journal, July 2006


Best Practices for QSAR Model Development, Validation, and Exploitation
journal, July 2010


Machine-learning-assisted materials discovery using failed experiments
journal, May 2016

  • Raccuglia, Paul; Elbert, Katherine C.; Adler, Philip D. F.
  • Nature, Vol. 533, Issue 7601
  • DOI: 10.1038/nature17439

THEORY OF PROTEIN FOLDING: The Energy Landscape Perspective
journal, October 1997


Deep learning in neural networks: An overview
journal, January 2015


Machine-learning approach for one- and two-body corrections to density functional theory: Applications to molecular and condensed water
journal, August 2013


Principles of QSAR models validation: internal and external
journal, May 2007