skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: How Much Chemistry Does a Deep Neural Network Need to Know to Make Accurate Predictions?

Abstract

In the last few years, we have seen the rise of deep learning applications in a broad range of computational chemistry research problems. Using human-engineered chemical features, such as molecular descriptors and fingerprints, deep learning models have shown similar, if not better performance that most traditional machine learning algorithms. Recently, we reported on the development of Chemception, a deep convolutional neural network (CNN) architecture for general-purpose small molecule property prediction. On average, Chemception matched the performance of expert-developed QSAR/QSPR models trained on chemical features (molecular fingerprints), despite that it was trained on just 2D images of molecular drawings with minimal chemical information. Here, we investigate the effects of systematically removing and adding basic chemical information to the image channels of the 2D images used to train Chemception. By augmenting our images with only 3 additional basic chemical information, we demonstrate the improvement of Chemception performance – that it is now more accurate than contemporary deep learning models trained on ECFP fingerprints for the prediction of toxicity, activity and solvation free energy, as well as physics-based free energy simulation methods for computing solvation properties. By altering the chemical information content in the image channels, and examining the resulting performance of Chemception,more » we also identify to two different “learning patterns” in toxicity/activity as compared to solvation free energy, and it parallels the fundamental differences in contemporary chemistry research for predicting toxicity/activity and solvation free energy.« less

Authors:
ORCiD logo [1];  [1];  [1]; ORCiD logo [1]; ORCiD logo [1]
  1. BATTELLE (PACIFIC NW LAB)
Publication Date:
Research Org.:
Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1558182
Report Number(s):
PNNL-SA-127201
DOE Contract Number:  
AC05-76RL01830
Resource Type:
Conference
Resource Relation:
Conference: IEEE Winter Conference on Applications of Computer Vision (WACV 2018), March 12-15, 2018, Lake Tahoe, NV
Country of Publication:
United States
Language:
English
Subject:
Artificial Intelligence, Cheminformatics, Computational Chemistry, Deep Learning, Machine Learning

Citation Formats

Goh, Garrett B., Siegel, Charles M., Vishnu, Abhinav, Hodas, Nathan O., and Baker, Nathan A. How Much Chemistry Does a Deep Neural Network Need to Know to Make Accurate Predictions?. United States: N. p., 2018. Web. doi:10.1109/WACV.2018.00151.
Goh, Garrett B., Siegel, Charles M., Vishnu, Abhinav, Hodas, Nathan O., & Baker, Nathan A. How Much Chemistry Does a Deep Neural Network Need to Know to Make Accurate Predictions?. United States. doi:10.1109/WACV.2018.00151.
Goh, Garrett B., Siegel, Charles M., Vishnu, Abhinav, Hodas, Nathan O., and Baker, Nathan A. Mon . "How Much Chemistry Does a Deep Neural Network Need to Know to Make Accurate Predictions?". United States. doi:10.1109/WACV.2018.00151.
@article{osti_1558182,
title = {How Much Chemistry Does a Deep Neural Network Need to Know to Make Accurate Predictions?},
author = {Goh, Garrett B. and Siegel, Charles M. and Vishnu, Abhinav and Hodas, Nathan O. and Baker, Nathan A.},
abstractNote = {In the last few years, we have seen the rise of deep learning applications in a broad range of computational chemistry research problems. Using human-engineered chemical features, such as molecular descriptors and fingerprints, deep learning models have shown similar, if not better performance that most traditional machine learning algorithms. Recently, we reported on the development of Chemception, a deep convolutional neural network (CNN) architecture for general-purpose small molecule property prediction. On average, Chemception matched the performance of expert-developed QSAR/QSPR models trained on chemical features (molecular fingerprints), despite that it was trained on just 2D images of molecular drawings with minimal chemical information. Here, we investigate the effects of systematically removing and adding basic chemical information to the image channels of the 2D images used to train Chemception. By augmenting our images with only 3 additional basic chemical information, we demonstrate the improvement of Chemception performance – that it is now more accurate than contemporary deep learning models trained on ECFP fingerprints for the prediction of toxicity, activity and solvation free energy, as well as physics-based free energy simulation methods for computing solvation properties. By altering the chemical information content in the image channels, and examining the resulting performance of Chemception, we also identify to two different “learning patterns” in toxicity/activity as compared to solvation free energy, and it parallels the fundamental differences in contemporary chemistry research for predicting toxicity/activity and solvation free energy.},
doi = {10.1109/WACV.2018.00151},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2018},
month = {5}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: