DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Improving deep learning-based protein distance prediction in CASP14

Abstract

Motivation. Accurate prediction of residue–residue distances is important for protein structure prediction. We developed several protein distance predictors based on a deep learning distance prediction method and blindly tested them in the 14th Critical Assessment of Protein Structure Prediction (CASP14). The prediction method uses deep residual neural networks with the channel-wise attention mechanism to classify the distance between every two residues into multiple distance intervals. The input features for the deep learning method include co-evolutionary features as well as other sequence-based features derived from multiple sequence alignments (MSAs). Three alignment methods are used with multiple protein sequence/profile databases to generate MSAs for input feature generation. Based on different configurations and training strategies of the deep learning method, five MULTICOM distance predictors were created to participate in the CASP14 experiment. Results. Benchmarked on 37 hard CASP14 domains, the best performing MULTICOM predictor is ranked 5th out of 30 automated CASP14 distance prediction servers in terms of precision of top L/5 long-range contact predictions [i.e. classifying distances between two residues into two categories: in contact (<8 Angstrom) and not in contact otherwise] and performs better than the best CASP13 distance prediction method. The best performing MULTICOM predictor is also ranked 6th amongmore » automated server predictors in classifying inter-residue distances into 10 distance intervals defined by CASP14 according to the precision of distance classification. The results show that the quality and depth of MSAs depend on alignment methods and sequence databases and have a significant impact on the accuracy of distance prediction. Using larger training datasets and multiple complementary features improves prediction accuracy. However, the number of effective sequences in MSAs is only a weak indicator of the quality of MSAs and the accuracy of predicted distance maps. In contrast, there is a strong correlation between the accuracy of contact/distance predictions and the average probability of the predicted contacts, which can therefore be more effectively used to estimate the confidence of distance predictions and select predicted distance maps.« less

Authors:
; ; ; ; ORCiD logo;
Publication Date:
Research Org.:
Georgia Institute of Technology, Atlanta, GA (United States); Donald Danforth Plant Science Center, St. Louis, MO (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Basic Energy Sciences (BES). Scientific User Facilities (SUF); National Science Foundation (NSF); National Institutes of Health (NIH); USDOE Office of Science (SC), Biological and Environmental Research (BER)
OSTI Identifier:
1826198
Alternate Identifier(s):
OSTI ID: 1787924; OSTI ID: 1839183; OSTI ID: 2278966; OSTI ID: 2318538
Grant/Contract Number:  
SC0020400; SC0021303; BIF132
Resource Type:
Published Article
Journal Name:
Bioinformatics
Additional Journal Information:
Journal Name: Bioinformatics Journal Volume: 37 Journal Issue: 19; Journal ID: ISSN 1367-4803
Publisher:
Oxford University Press
Country of Publication:
United Kingdom
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES

Citation Formats

Guo, Zhiye, Wu, Tianqi, Liu, Jian, Hou, Jie, Cheng, Jianlin, and Martelli, ed., Pier Luigi. Improving deep learning-based protein distance prediction in CASP14. United Kingdom: N. p., 2021. Web. doi:10.1093/bioinformatics/btab355.
Guo, Zhiye, Wu, Tianqi, Liu, Jian, Hou, Jie, Cheng, Jianlin, & Martelli, ed., Pier Luigi. Improving deep learning-based protein distance prediction in CASP14. United Kingdom. https://doi.org/10.1093/bioinformatics/btab355
Guo, Zhiye, Wu, Tianqi, Liu, Jian, Hou, Jie, Cheng, Jianlin, and Martelli, ed., Pier Luigi. Fri . "Improving deep learning-based protein distance prediction in CASP14". United Kingdom. https://doi.org/10.1093/bioinformatics/btab355.
@article{osti_1826198,
title = {Improving deep learning-based protein distance prediction in CASP14},
author = {Guo, Zhiye and Wu, Tianqi and Liu, Jian and Hou, Jie and Cheng, Jianlin and Martelli, ed., Pier Luigi},
abstractNote = {Motivation. Accurate prediction of residue–residue distances is important for protein structure prediction. We developed several protein distance predictors based on a deep learning distance prediction method and blindly tested them in the 14th Critical Assessment of Protein Structure Prediction (CASP14). The prediction method uses deep residual neural networks with the channel-wise attention mechanism to classify the distance between every two residues into multiple distance intervals. The input features for the deep learning method include co-evolutionary features as well as other sequence-based features derived from multiple sequence alignments (MSAs). Three alignment methods are used with multiple protein sequence/profile databases to generate MSAs for input feature generation. Based on different configurations and training strategies of the deep learning method, five MULTICOM distance predictors were created to participate in the CASP14 experiment. Results. Benchmarked on 37 hard CASP14 domains, the best performing MULTICOM predictor is ranked 5th out of 30 automated CASP14 distance prediction servers in terms of precision of top L/5 long-range contact predictions [i.e. classifying distances between two residues into two categories: in contact (<8 Angstrom) and not in contact otherwise] and performs better than the best CASP13 distance prediction method. The best performing MULTICOM predictor is also ranked 6th among automated server predictors in classifying inter-residue distances into 10 distance intervals defined by CASP14 according to the precision of distance classification. The results show that the quality and depth of MSAs depend on alignment methods and sequence databases and have a significant impact on the accuracy of distance prediction. Using larger training datasets and multiple complementary features improves prediction accuracy. However, the number of effective sequences in MSAs is only a weak indicator of the quality of MSAs and the accuracy of predicted distance maps. In contrast, there is a strong correlation between the accuracy of contact/distance predictions and the average probability of the predicted contacts, which can therefore be more effectively used to estimate the confidence of distance predictions and select predicted distance maps.},
doi = {10.1093/bioinformatics/btab355},
journal = {Bioinformatics},
number = 19,
volume = 37,
place = {United Kingdom},
year = {Fri May 07 00:00:00 EDT 2021},
month = {Fri May 07 00:00:00 EDT 2021}
}

Works referenced in this record:

CONFOLD2: improved contact-driven ab initio protein structure modeling
journal, January 2018


AmoebaContact and GDFold as a pipeline for rapid de novo protein structure prediction
journal, December 2019


Analysis of distance‐based protein structure prediction by deep learning in CASP13
journal, August 2019

  • Xu, Jinbo; Wang, Sheng
  • Proteins: Structure, Function, and Bioinformatics, Vol. 87, Issue 12
  • DOI: 10.1002/prot.25810

Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13
journal, April 2019

  • Hou, Jie; Wu, Tianqi; Cao, Renzhi
  • Proteins: Structure, Function, and Bioinformatics, Vol. 87, Issue 12
  • DOI: 10.1002/prot.25697

Protein-level assembly increases protein sequence recovery from metagenomic samples manyfold
journal, June 2019


Improved protein structure prediction using potentials from deep learning
journal, January 2020


Deep‐learning contact‐map guided protein structure prediction in CASP13
journal, August 2019

  • Zheng, Wei; Li, Yang; Zhang, Chengxin
  • Proteins: Structure, Function, and Bioinformatics, Vol. 87, Issue 12
  • DOI: 10.1002/prot.25792

Critical assessment of methods of protein structure prediction (CASP)—Round XIII
journal, August 2019

  • Kryshtafovych, Andriy; Schwede, Torsten; Topf, Maya
  • Proteins: Structure, Function, and Bioinformatics, Vol. 87, Issue 12
  • DOI: 10.1002/prot.25823

Crystallography & NMR System: A New Software Suite for Macromolecular Structure Determination
journal, September 1998

  • Brünger, A. T.; Adams, P. D.; Clore, G. M.
  • Acta Crystallographica Section D Biological Crystallography, Vol. 54, Issue 5
  • DOI: 10.1107/S0907444998003254

Clustering huge protein sequence sets in linear time
journal, June 2018


Improved contact prediction in proteins: Using pseudolikelihoods to infer Potts models
journal, January 2013


Assessing the utility of coevolution-based residue-residue contact predictions in a sequence- and structure-rich era
journal, September 2013

  • Kamisetty, H.; Ovchinnikov, S.; Baker, D.
  • Proceedings of the National Academy of Sciences, Vol. 110, Issue 39
  • DOI: 10.1073/pnas.1314045110

Combination of deep neural network with attention mechanism enhances the explainability of protein contact prediction
journal, February 2021

  • Chen, Chen; Wu, Tianqi; Guo, Zhiye
  • Proteins: Structure, Function, and Bioinformatics, Vol. 89, Issue 6
  • DOI: 10.1002/prot.26052

HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment
journal, December 2011

  • Remmert, Michael; Biegert, Andreas; Hauser, Andreas
  • Nature Methods, Vol. 9, Issue 2
  • DOI: 10.1038/nmeth.1818

CCMpred—fast and precise prediction of protein residue–residue contacts from correlated mutations
journal, July 2014


Improved protein structure prediction using predicted interresidue orientations
journal, January 2020

  • Yang, Jianyi; Anishchenko, Ivan; Park, Hahnbeom
  • Proceedings of the National Academy of Sciences, Vol. 117, Issue 3
  • DOI: 10.1073/pnas.1914677117

Deep learning extends de novo protein modelling coverage of genomes using iteratively predicted structural constraints
journal, September 2019


Uniclust databases of clustered and deeply annotated protein sequences and alignments
journal, November 2016

  • Mirdita, Milot; von den Driesch, Lars; Galiez, Clovis
  • Nucleic Acids Research, Vol. 45, Issue D1
  • DOI: 10.1093/nar/gkw1081

The Protein Data Bank
journal, January 2000


ConEVA: a toolbox for comprehensive assessment of protein contacts
journal, December 2016

  • Adhikari, Badri; Nowotny, Jackson; Bhattacharya, Debswapna
  • BMC Bioinformatics, Vol. 17, Issue 1
  • DOI: 10.1186/s12859-016-1404-z

Prediction of interresidue contacts with DeepMetaPSICOV in CASP13
journal, July 2019

  • Kandathil, Shaun M.; Greener, Joe G.; Jones, David T.
  • Proteins: Structure, Function, and Bioinformatics, Vol. 87, Issue 12
  • DOI: 10.1002/prot.25779

High precision in protein contact prediction using fully convolutional neural networks and minimal sequence features
journal, April 2018


ResPRE: high-accuracy protein contact prediction by coupling precision matrix with deep residual neural networks
journal, May 2019


Protein secondary structure prediction based on position-specific scoring matrices 1 1Edited by G. Von Heijne
journal, September 1999