DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Decoding the protein–ligand interactions using parallel graph neural networks

Journal Article · · Scientific Reports

Abstract Protein–ligand interactions (PLIs) are essential for biochemical functionality and their identification is crucial for estimating biophysical properties for rational therapeutic design. Currently, experimental characterization of these properties is the most accurate method, however, this is very time-consuming and labor-intensive. A number of computational methods have been developed in this context but most of the existing PLI prediction heavily depends on 2D protein sequence data. Here, we present a novel parallel graph neural network (GNN) to integrate knowledge representation and reasoning for PLI prediction to perform deep learning guided by expert knowledge and informed by 3D structural data. We develop two distinct GNN architectures: $$$$\hbox {GNN}_{\mathrm{F}}$$$$ GNN F is the base implementation that employs distinct featurization to enhance domain-awareness, while $$$$\hbox {GNN}_{\mathrm{P}}$$$$ GNN P is a novel implementation that can predict with no prior knowledge of the intermolecular interactions. The comprehensive evaluation demonstrated that GNN can successfully capture the binary interactions between ligand and protein’s 3D structure with 0.979 test accuracy for $$$$\hbox {GNN}_{\mathrm{F}}$$$$ GNN F and 0.958 for $$$$\hbox {GNN}_{\mathrm{P}}$$$$ GNN P for predicting activity of a protein–ligand complex. These models are further adapted for regression tasks to predict experimental binding affinities and $$$$\hbox {pIC}_{\mathrm{50}}$$$$ pIC 50 crucial for compound’s potency and efficacy. We achieve a Pearson correlation coefficient of 0.66 and 0.65 on experimental affinity and 0.50 and 0.51 on $$$$\hbox {pIC}_{\mathrm{50}}$$$$ pIC 50 with $$$$\hbox {GNN}_{\mathrm{F}}$$$$ GNN F and $$$$\hbox {GNN}_{\mathrm{P}}$$$$ GNN P , respectively, outperforming similar 2D sequence based models. Our method can serve as an interpretable and explainable artificial intelligence (AI) tool for predicted activity, potency, and biophysical properties of lead candidates. To this end, we show the utility of $$$$\hbox {GNN}_{\mathrm{P}}$$$$ GNN P on SARS-Cov-2 protein targets by screening a large compound library and comparing the prediction with the experimentally measured data.

Research Organization:
Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
Sponsoring Organization:
USDOE
Grant/Contract Number:
AC05-76RL01830
OSTI ID:
1867072
Report Number(s):
PNNL-SA-166075; 7624; PII: 10418
Journal Information:
Scientific Reports, Journal Name: Scientific Reports Journal Issue: 1 Vol. 12; ISSN 2045-2322
Publisher:
Nature Publishing GroupCopyright Statement
Country of Publication:
United Kingdom
Language:
English

References (35)

DOCK 6: Impact of new features and current docking performance journal April 2015
Molecular graph convolutions: moving beyond fingerprints journal August 2016
Artificial intelligence in drug design journal July 2018
LigandFit: a novel method for the shape-directed rapid docking of ligands to protein active sites journal January 2003
Discovery of ZAP70 inhibitors by high-throughput docking into a conformation of its kinase domain generated by molecular dynamics journal October 2013
MONN: A Multi-objective Neural Network for Predicting Compound-Protein Interactions and Affinities journal April 2020
A review on deep learning approaches in healthcare systems: Taxonomies, challenges, and open issues journal January 2021
Improved Protein–Ligand Binding Affinity Prediction with Structure-Based Deep Fusion Inference journal March 2021
Protein–Ligand Scoring with Convolutional Neural Networks journal April 2017
K DEEP : Protein–Ligand Absolute Binding Affinity Prediction via 3D-Convolutional Neural Networks journal January 2018
Predicting Drug–Target Interaction Using a Novel Graph Neural Network with 3D Structure-Embedded Graph Representation journal August 2019
Graph Convolutional Neural Networks for Predicting Drug-Target Interactions journal October 2019
Accurate Modeling of Scaffold Hopping Transformations in Drug Discovery journal December 2016
Predicting Drug–Target Interactions with Deep-Embedding Learning of Graphs and Sequences journal June 2021
Deep-Learning-Based Drug–Target Interaction Prediction journal March 2017
Potent Noncovalent Inhibitors of the Main Protease of SARS-CoV-2 from Molecular Sculpting of the Drug Perampanel Guided by Free Energy Perturbation Calculations journal February 2021
Surflex:  Fully Automatic Flexible Molecular Docking Using a Molecular Similarity-Based Search Engine journal February 2003
The PDBbind Database:  Collection of Binding Affinities for Protein−Ligand Complexes with Known Three-Dimensional Structures journal June 2004
Glide:  A New Approach for Rapid, Accurate Docking and Scoring. 1. Method and Assessment of Docking Accuracy journal March 2004
The PDBbind Database:  Methodologies and Updates journal June 2005
Directory of Useful Decoys, Enhanced (DUD-E): Better Ligands and Decoys for Better Benchmarking journal July 2012
A Simple QM/MM Approach for Capturing Polarization Effects in Protein−Ligand Binding Free Energy Calculations journal May 2011
Artificial intelligence in early drug discovery enabling precision medicine journal June 2021
LSTM-PHV: prediction of human-virus protein–protein interactions by LSTM with word2vec journal June 2021
Development and evaluation of a deep learning model for protein–ligand binding affinity prediction journal May 2018
DeepDTA: deep drug–target binding affinity prediction journal September 2018
DeepAffinity: interpretable deep learning of compound–protein affinity through unified recurrent and convolutional neural networks journal February 2019
Geometric Deep Learning on Graphs and Manifolds Using Mixture Model CNNs conference July 2017
Computational Methods in Drug Discovery journal December 2013
The influence of the inactives subset generation on the performance of machine learning methods journal April 2013
rDock: A Fast, Versatile and Open Source Program for Docking Ligands to Proteins and Nucleic Acids journal April 2014
Hidden bias in the DUD-E dataset leads to misleading performance of deep learning in structure-based virtual screening journal August 2019
Recent Development of Bioinformatics Tools for microRNA Target Prediction journal February 2022
Evolution of Sequence-based Bioinformatics Tools for Protein-protein Interaction Prediction journal September 2020
Crystal Structure of SARS-CoV-2 Main Protease in Complex with the Non-Covalent Inhibitor ML188 journal January 2021