skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Evaluating point-prediction uncertainties in neural networks for protein-ligand binding prediction

Journal Article · · Artificial Intelligence Chemistry
ORCiD logo [1];  [2];  [2]; ORCiD logo [3];  [4];  [4];  [4]
  1. Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States). Center for Applied Scientific Computing
  2. Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States). Biological Science and Security Center
  3. Frederick National Laboratory for Cancer Research, Frederick, MD (United States)
  4. Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)

Neural Network (NN) models provide potential to speed up the drug discovery process and reduce its failure rates. The success of NN models requires uncertainty quantification (UQ) as drug discovery explores chemical space beyond the training data distribution. Standard NN models do not provide uncertainty information. Some methods require changing the NN architecture or training procedure, limiting the selection of NN models. Moreover, predictive uncertainty can come from different sources. It is important to have the ability to separately model different types of predictive uncertainty, as the model can take assorted actions depending on the source of uncertainty. In this paper, we examine UQ methods that estimate different sources of predictive uncertainty for NN models aiming at protein-ligand binding prediction. We use our prior knowledge on chemical compounds to design the experiments. By utilizing a visualization method we create non-overlapping and chemically diverse partitions from a collection of chemical compounds. These partitions are used as training and test set splits to explore NN model uncertainty. We demonstrate how the uncertainties estimated by the selected methods describe different sources of uncertainty under different partitions and featurization schemes and the relationship to prediction error.

Research Organization:
Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA); Defense Threat Reduction Agency (DTRA); National Institutes of Health (NIH); Department of Health and Human Services
Grant/Contract Number:
AC52-07NA27344; HDTRA1036045; 75N91019D00024; 75N91019F00134
OSTI ID:
1988215
Report Number(s):
LLNL-JRNL-839676; 1060646
Journal Information:
Artificial Intelligence Chemistry, Vol. 1, Issue 1; ISSN 2949-7477
Publisher:
ElsevierCopyright Statement
Country of Publication:
United States
Language:
English

References (27)

High-throughput virtual screening of small molecule inhibitors for SARS-CoV-2 protein targets with deep fusion models
  • Stevenson, Garrett A.; Jones, Derek; Kim, Hyojin
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1145/3458817.3476193
conference November 2021
Protein kinases — the major drug targets of the twenty-first century? journal April 2002
Evaluating Scalable Uncertainty Estimation Methods for Deep Learning-Based Molecular Property Prediction journal April 2020
Opportunities and obstacles for deep learning in biology and medicine journal April 2018
Analyzing Learned Molecular Representations for Property Prediction journal July 2019
A hybrid framework for improving uncertainty quantification in deep learning-based QSAR regression modeling journal September 2021
Machine Learning Models to Predict Inhibition of the Bile Salt Export Pump journal January 2021
Generalized Born Model with a Simple, Robust Molecular Volume Correction journal December 2006
Keeping the neural networks simple by minimizing the description length of the weights conference January 1993
Leveraging Uncertainty in Machine Learning Accelerates Biological Discovery and Design journal November 2020
Understanding Cytotoxicity and Cytostaticity in a High-Throughput Screening Collection journal September 2016
Bayesian neural networks journal September 1989
Large scale comparison of QSAR and conformal prediction methods and their applications in drug discovery journal January 2019
ChEMBL web services: streamlining access to drug discovery data and utilities journal April 2015
ChEMBL: towards direct deposition of bioassay data journal November 2018
A review of uncertainty quantification in deep learning: Techniques, applications and challenges journal December 2021
A Practical Bayesian Framework for Backpropagation Networks journal May 1992
Protein Kinase Inhibitors: Insights into Drug Design from Structure journal March 2004
Influence of Varying Training Set Composition and Size on Support Vector Machine-Based Prediction of Active Compounds journal April 2017
Uncertainty quantification in drug design journal February 2021
Applications of machine learning in drug discovery and development journal April 2019
Uncertainty Quantification in CNN Through the Bootstrap of Convex Neural Networks journal May 2021
AMPL: A Data-Driven Modeling Pipeline for Drug Discovery journal April 2020
Towards reproducible computational drug discovery journal January 2020
Prediction of atomization energy using graph kernel and active learning journal January 2019
Drug discovery with explainable artificial intelligence journal October 2020
Uncertainty Quantification Using Neural Networks for Molecular Property Prediction journal July 2020