Improving deep learning-based protein distance prediction in CASP14
Abstract
Motivation. Accurate prediction of residue–residue distances is important for protein structure prediction. We developed several protein distance predictors based on a deep learning distance prediction method and blindly tested them in the 14th Critical Assessment of Protein Structure Prediction (CASP14). The prediction method uses deep residual neural networks with the channel-wise attention mechanism to classify the distance between every two residues into multiple distance intervals. The input features for the deep learning method include co-evolutionary features as well as other sequence-based features derived from multiple sequence alignments (MSAs). Three alignment methods are used with multiple protein sequence/profile databases to generate MSAs for input feature generation. Based on different configurations and training strategies of the deep learning method, five MULTICOM distance predictors were created to participate in the CASP14 experiment. Results. Benchmarked on 37 hard CASP14 domains, the best performing MULTICOM predictor is ranked 5th out of 30 automated CASP14 distance prediction servers in terms of precision of top L/5 long-range contact predictions [i.e. classifying distances between two residues into two categories: in contact (<8 Angstrom) and not in contact otherwise] and performs better than the best CASP13 distance prediction method. The best performing MULTICOM predictor is also ranked 6th amongmore »
- Authors:
- Publication Date:
- Research Org.:
- Georgia Institute of Technology, Atlanta, GA (United States); Donald Danforth Plant Science Center, St. Louis, MO (United States)
- Sponsoring Org.:
- USDOE Office of Science (SC), Basic Energy Sciences (BES). Scientific User Facilities (SUF); National Science Foundation (NSF); National Institutes of Health (NIH); USDOE Office of Science (SC), Biological and Environmental Research (BER)
- OSTI Identifier:
- 1826198
- Alternate Identifier(s):
- OSTI ID: 1787924; OSTI ID: 1839183; OSTI ID: 2278966; OSTI ID: 2318538
- Grant/Contract Number:
- SC0020400; SC0021303; BIF132
- Resource Type:
- Published Article
- Journal Name:
- Bioinformatics
- Additional Journal Information:
- Journal Name: Bioinformatics Journal Volume: 37 Journal Issue: 19; Journal ID: ISSN 1367-4803
- Publisher:
- Oxford University Press
- Country of Publication:
- United Kingdom
- Language:
- English
- Subject:
- 59 BASIC BIOLOGICAL SCIENCES
Citation Formats
Guo, Zhiye, Wu, Tianqi, Liu, Jian, Hou, Jie, Cheng, Jianlin, and Martelli, ed., Pier Luigi. Improving deep learning-based protein distance prediction in CASP14. United Kingdom: N. p., 2021.
Web. doi:10.1093/bioinformatics/btab355.
Guo, Zhiye, Wu, Tianqi, Liu, Jian, Hou, Jie, Cheng, Jianlin, & Martelli, ed., Pier Luigi. Improving deep learning-based protein distance prediction in CASP14. United Kingdom. https://doi.org/10.1093/bioinformatics/btab355
Guo, Zhiye, Wu, Tianqi, Liu, Jian, Hou, Jie, Cheng, Jianlin, and Martelli, ed., Pier Luigi. Fri .
"Improving deep learning-based protein distance prediction in CASP14". United Kingdom. https://doi.org/10.1093/bioinformatics/btab355.
@article{osti_1826198,
title = {Improving deep learning-based protein distance prediction in CASP14},
author = {Guo, Zhiye and Wu, Tianqi and Liu, Jian and Hou, Jie and Cheng, Jianlin and Martelli, ed., Pier Luigi},
abstractNote = {Motivation. Accurate prediction of residue–residue distances is important for protein structure prediction. We developed several protein distance predictors based on a deep learning distance prediction method and blindly tested them in the 14th Critical Assessment of Protein Structure Prediction (CASP14). The prediction method uses deep residual neural networks with the channel-wise attention mechanism to classify the distance between every two residues into multiple distance intervals. The input features for the deep learning method include co-evolutionary features as well as other sequence-based features derived from multiple sequence alignments (MSAs). Three alignment methods are used with multiple protein sequence/profile databases to generate MSAs for input feature generation. Based on different configurations and training strategies of the deep learning method, five MULTICOM distance predictors were created to participate in the CASP14 experiment. Results. Benchmarked on 37 hard CASP14 domains, the best performing MULTICOM predictor is ranked 5th out of 30 automated CASP14 distance prediction servers in terms of precision of top L/5 long-range contact predictions [i.e. classifying distances between two residues into two categories: in contact (<8 Angstrom) and not in contact otherwise] and performs better than the best CASP13 distance prediction method. The best performing MULTICOM predictor is also ranked 6th among automated server predictors in classifying inter-residue distances into 10 distance intervals defined by CASP14 according to the precision of distance classification. The results show that the quality and depth of MSAs depend on alignment methods and sequence databases and have a significant impact on the accuracy of distance prediction. Using larger training datasets and multiple complementary features improves prediction accuracy. However, the number of effective sequences in MSAs is only a weak indicator of the quality of MSAs and the accuracy of predicted distance maps. In contrast, there is a strong correlation between the accuracy of contact/distance predictions and the average probability of the predicted contacts, which can therefore be more effectively used to estimate the confidence of distance predictions and select predicted distance maps.},
doi = {10.1093/bioinformatics/btab355},
journal = {Bioinformatics},
number = 19,
volume = 37,
place = {United Kingdom},
year = {Fri May 07 00:00:00 EDT 2021},
month = {Fri May 07 00:00:00 EDT 2021}
}
https://doi.org/10.1093/bioinformatics/btab355
Works referenced in this record:
CONFOLD2: improved contact-driven ab initio protein structure modeling
journal, January 2018
- Adhikari, Badri; Cheng, Jianlin
- BMC Bioinformatics, Vol. 19, Issue 1
AmoebaContact and GDFold as a pipeline for rapid de novo protein structure prediction
journal, December 2019
- Mao, Wenzhi; Ding, Wenze; Xing, Yaoguang
- Nature Machine Intelligence, Vol. 2, Issue 1
DeepMSA: constructing deep multiple sequence alignment to improve contact prediction and fold-recognition for distant-homology proteins
journal, November 2019
- Zhang, Chengxin; Zheng, Wei; Mortuza, S. M.
- Bioinformatics, Vol. 36, Issue 7
Analysis of distance‐based protein structure prediction by deep learning in CASP13
journal, August 2019
- Xu, Jinbo; Wang, Sheng
- Proteins: Structure, Function, and Bioinformatics, Vol. 87, Issue 12
Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13
journal, April 2019
- Hou, Jie; Wu, Tianqi; Cao, Renzhi
- Proteins: Structure, Function, and Bioinformatics, Vol. 87, Issue 12
Protein-level assembly increases protein sequence recovery from metagenomic samples manyfold
journal, June 2019
- Steinegger, Martin; Mirdita, Milot; Söding, Johannes
- Nature Methods, Vol. 16, Issue 7
Improved protein structure prediction using potentials from deep learning
journal, January 2020
- Senior, Andrew W.; Evans, Richard; Jumper, John
- Nature, Vol. 577, Issue 7792
Deep‐learning contact‐map guided protein structure prediction in CASP13
journal, August 2019
- Zheng, Wei; Li, Yang; Zhang, Chengxin
- Proteins: Structure, Function, and Bioinformatics, Vol. 87, Issue 12
Critical assessment of methods of protein structure prediction (CASP)—Round XIII
journal, August 2019
- Kryshtafovych, Andriy; Schwede, Torsten; Topf, Maya
- Proteins: Structure, Function, and Bioinformatics, Vol. 87, Issue 12
Crystallography & NMR System: A New Software Suite for Macromolecular Structure Determination
journal, September 1998
- Brünger, A. T.; Adams, P. D.; Clore, G. M.
- Acta Crystallographica Section D Biological Crystallography, Vol. 54, Issue 5
Clustering huge protein sequence sets in linear time
journal, June 2018
- Steinegger, Martin; Söding, Johannes
- Nature Communications, Vol. 9, Issue 1
Improved contact prediction in proteins: Using pseudolikelihoods to infer Potts models
journal, January 2013
- Ekeberg, Magnus; Lövkvist, Cecilia; Lan, Yueheng
- Physical Review E, Vol. 87, Issue 1
Assessing the utility of coevolution-based residue-residue contact predictions in a sequence- and structure-rich era
journal, September 2013
- Kamisetty, H.; Ovchinnikov, S.; Baker, D.
- Proceedings of the National Academy of Sciences, Vol. 110, Issue 39
Analysis of several key factors influencing deep learning-based inter-residue contact prediction
journal, August 2019
- Wu, Tianqi; Hou, Jie; Adhikari, Badri
- Bioinformatics
Combination of deep neural network with attention mechanism enhances the explainability of protein contact prediction
journal, February 2021
- Chen, Chen; Wu, Tianqi; Guo, Zhiye
- Proteins: Structure, Function, and Bioinformatics, Vol. 89, Issue 6
HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment
journal, December 2011
- Remmert, Michael; Biegert, Andreas; Hauser, Andreas
- Nature Methods, Vol. 9, Issue 2
CCMpred—fast and precise prediction of protein residue–residue contacts from correlated mutations
journal, July 2014
- Seemayer, Stefan; Gruber, Markus; Söding, Johannes
- Bioinformatics, Vol. 30, Issue 21
Improved protein structure prediction using predicted interresidue orientations
journal, January 2020
- Yang, Jianyi; Anishchenko, Ivan; Park, Hahnbeom
- Proceedings of the National Academy of Sciences, Vol. 117, Issue 3
Deep learning extends de novo protein modelling coverage of genomes using iteratively predicted structural constraints
journal, September 2019
- Greener, Joe G.; Kandathil, Shaun M.; Jones, David T.
- Nature Communications, Vol. 10, Issue 1
Uniclust databases of clustered and deeply annotated protein sequences and alignments
journal, November 2016
- Mirdita, Milot; von den Driesch, Lars; Galiez, Clovis
- Nucleic Acids Research, Vol. 45, Issue D1
ConEVA: a toolbox for comprehensive assessment of protein contacts
journal, December 2016
- Adhikari, Badri; Nowotny, Jackson; Bhattacharya, Debswapna
- BMC Bioinformatics, Vol. 17, Issue 1
Prediction of interresidue contacts with DeepMetaPSICOV in CASP13
journal, July 2019
- Kandathil, Shaun M.; Greener, Joe G.; Jones, David T.
- Proteins: Structure, Function, and Bioinformatics, Vol. 87, Issue 12
High precision in protein contact prediction using fully convolutional neural networks and minimal sequence features
journal, April 2018
- Jones, David T.; Kandathil, Shaun M.
- Bioinformatics, Vol. 34, Issue 19
ResPRE: high-accuracy protein contact prediction by coupling precision matrix with deep residual neural networks
journal, May 2019
- Li, Yang; Hu, Jun; Zhang, Chengxin
- Bioinformatics, Vol. 35, Issue 22
Protein secondary structure prediction based on position-specific scoring matrices 1 1Edited by G. Von Heijne
journal, September 1999
- Jones, David T.
- Journal of Molecular Biology, Vol. 292, Issue 2