Protein model accuracy estimation empowered by deep learning and inter-residue distance prediction in CASP14
Abstract
The inter-residue contact prediction and deep learning showed the promise to improve the estimation of protein model accuracy (EMA) in the 13th Critical Assessment of Protein Structure Prediction (CASP13). To further leverage the improved inter-residue distance predictions to enhance EMA, during the 2020 CASP14 experiment, we integrated several new inter-residue distance features with the existing model quality assessment features in several deep learning methods to predict the quality of protein structural models. According to the evaluation of performance in selecting the best model from the models of CASP14 targets, our three multi-model predictors of estimating model accuracy (MULTICOM-CONSTRUCT, MULTICOM-AI, and MULTICOM-CLUSTER) achieve the averaged loss of 0.073, 0.079, and 0.081, respectively, in terms of the global distance test score (GDT-TS). The three methods are ranked first, second, and third out of all 68 CASP14 predictors. MULTICOM-DEEP, the single-model predictor of estimating model accuracy (EMA), is ranked within top 10 among all the single-model EMA methods according to GDT-TS score loss. The results demonstrate that inter-residue distance features are valuable inputs for deep learning to predict the quality of protein structural models. However, larger training datasets and better ways of leveraging inter-residue distance information are needed to fully explore its potentials.
- Authors:
- Publication Date:
- Research Org.:
- Georgia Inst. of Technology, Atlanta, GA (United States). Georgia Tech Research Institute; Donald Danforth Plant Science Center, St. Louis, MO (United States)
- Sponsoring Org.:
- National Science Foundation (NSF); National Institutes of Health (NIH); USDOE Office of Science (SC), Biological and Environmental Research (BER)
- OSTI Identifier:
- 1784581
- Alternate Identifier(s):
- OSTI ID: 1816544; OSTI ID: 1839182; OSTI ID: 2278965; OSTI ID: 2318539
- Grant/Contract Number:
- SC0020400; SC0021303; DBI 1759934; IIS1763246; GM093123
- Resource Type:
- Published Article
- Journal Name:
- Scientific Reports
- Additional Journal Information:
- Journal Name: Scientific Reports Journal Volume: 11 Journal Issue: 1; Journal ID: ISSN 2045-2322
- Publisher:
- Nature Publishing Group
- Country of Publication:
- United Kingdom
- Language:
- English
- Subject:
- 59 BASIC BIOLOGICAL SCIENCES
Citation Formats
Chen, Xiao, Liu, Jian, Guo, Zhiye, Wu, Tianqi, Hou, Jie, and Cheng, Jianlin. Protein model accuracy estimation empowered by deep learning and inter-residue distance prediction in CASP14. United Kingdom: N. p., 2021.
Web. doi:10.1038/s41598-021-90303-6.
Chen, Xiao, Liu, Jian, Guo, Zhiye, Wu, Tianqi, Hou, Jie, & Cheng, Jianlin. Protein model accuracy estimation empowered by deep learning and inter-residue distance prediction in CASP14. United Kingdom. https://doi.org/10.1038/s41598-021-90303-6
Chen, Xiao, Liu, Jian, Guo, Zhiye, Wu, Tianqi, Hou, Jie, and Cheng, Jianlin. Tue .
"Protein model accuracy estimation empowered by deep learning and inter-residue distance prediction in CASP14". United Kingdom. https://doi.org/10.1038/s41598-021-90303-6.
@article{osti_1784581,
title = {Protein model accuracy estimation empowered by deep learning and inter-residue distance prediction in CASP14},
author = {Chen, Xiao and Liu, Jian and Guo, Zhiye and Wu, Tianqi and Hou, Jie and Cheng, Jianlin},
abstractNote = {The inter-residue contact prediction and deep learning showed the promise to improve the estimation of protein model accuracy (EMA) in the 13th Critical Assessment of Protein Structure Prediction (CASP13). To further leverage the improved inter-residue distance predictions to enhance EMA, during the 2020 CASP14 experiment, we integrated several new inter-residue distance features with the existing model quality assessment features in several deep learning methods to predict the quality of protein structural models. According to the evaluation of performance in selecting the best model from the models of CASP14 targets, our three multi-model predictors of estimating model accuracy (MULTICOM-CONSTRUCT, MULTICOM-AI, and MULTICOM-CLUSTER) achieve the averaged loss of 0.073, 0.079, and 0.081, respectively, in terms of the global distance test score (GDT-TS). The three methods are ranked first, second, and third out of all 68 CASP14 predictors. MULTICOM-DEEP, the single-model predictor of estimating model accuracy (EMA), is ranked within top 10 among all the single-model EMA methods according to GDT-TS score loss. The results demonstrate that inter-residue distance features are valuable inputs for deep learning to predict the quality of protein structural models. However, larger training datasets and better ways of leveraging inter-residue distance information are needed to fully explore its potentials.},
doi = {10.1038/s41598-021-90303-6},
journal = {Scientific Reports},
number = 1,
volume = 11,
place = {United Kingdom},
year = {Tue May 25 00:00:00 EDT 2021},
month = {Tue May 25 00:00:00 EDT 2021}
}
https://doi.org/10.1038/s41598-021-90303-6
Works referenced in this record:
Improvement of 3D protein models using multiple templates guided by single-template model quality assessment
journal, May 2012
- Buenavista, Maria T.; Roche, Daniel B.; McGuffin, Liam J.
- Bioinformatics, Vol. 28, Issue 14
Assessment of model accuracy estimations in CASP12
journal, September 2017
- Kryshtafovych, Andriy; Monastyrskyy, Bohdan; Fidelis, Krzysztof
- Proteins: Structure, Function, and Bioinformatics, Vol. 86
DNCON2: improved protein contact prediction using two-level deep convolutional neural networks
journal, December 2017
- Adhikari, Badri; Hou, Jie; Cheng, Jianlin
- Bioinformatics, Vol. 34, Issue 9
Voronota: A fast and reliable tool for computing the vertices of the Voronoi diagram of atomic balls
journal, February 2014
- Olechnovič, Kliment; Venclovas, Česlovas
- Journal of Computational Chemistry, Vol. 35, Issue 8
CATH: an expanded resource to predict protein function through structure and sequence
journal, November 2016
- Dawson, Natalie L.; Lewis, Tony E.; Das, Sayoni
- Nucleic Acids Research, Vol. 45, Issue D1
QDeep: distance-based protein model quality estimation by residue-level ensemble error classifications using stacked deep residual neural networks
journal, July 2020
- Shuvo, Md Hossain; Bhattacharya, Sutanu; Bhattacharya, Debswapna
- Bioinformatics, Vol. 36, Issue Supplement_1
A Novel Side-Chain Orientation Dependent Potential Derived from Random-Walk Reference State for Protein Fold Selection and Structure Prediction
journal, October 2010
- Zhang, Jian; Zhang, Yang
- PLoS ONE, Vol. 5, Issue 10
Smooth orientation-dependent scoring function for coarse-grained protein quality assessment
journal, December 2018
- Karasikov, Mikhail; Pagès, Guillaume; Grudinin, Sergei
- Bioinformatics, Vol. 35, Issue 16
APOLLO: a quality assessment service for single and multiple protein models
journal, May 2011
- Wang, Z.; Eickholt, J.; Cheng, J.
- Bioinformatics, Vol. 27, Issue 12
Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13
journal, April 2019
- Hou, Jie; Wu, Tianqi; Cao, Renzhi
- Proteins: Structure, Function, and Bioinformatics, Vol. 87, Issue 12
Specific interactions for ab initio folding of protein terminal regions with secondary structures
journal, February 2008
- Yang, Yuedong; Zhou, Yaoqi
- Proteins: Structure, Function, and Bioinformatics, Vol. 72, Issue 2
OPUS-PSP: An Orientation-dependent Statistical All-atom Potential Derived from Side-chain Packing
journal, February 2008
- Lu, Mingyang; Dousis, Athanasios D.; Ma, Jianpeng
- Journal of Molecular Biology, Vol. 376, Issue 1
Assessment of protein model structure accuracy estimation in CASP13: Challenges in the era of deep learning
journal, August 2019
- Won, Jonghun; Baek, Minkyung; Monastyrskyy, Bohdan
- Proteins: Structure, Function, and Bioinformatics, Vol. 87, Issue 12
Statistical potential for assessment and prediction of protein structures
journal, November 2006
- Shen, Min-yi; Sali, Andrej
- Protein Science, Vol. 15, Issue 11
Ultrastructural Characterization of the Lower Motor System in a Mouse Model of Krabbe Disease
journal, December 2016
- Cappello, Valentina; Marchetti, Laura; Parlanti, Paola
- Scientific Reports, Vol. 6, Issue 1
Estimation of model accuracy in CASP13
journal, July 2019
- Cheng, Jianlin; Choe, Myong‐Ho; Elofsson, Arne
- Proteins: Structure, Function, and Bioinformatics, Vol. 87, Issue 12
DeepDist: real-value inter-residue distance prediction with deep residual convolutional network
journal, January 2021
- Wu, Tianqi; Guo, Zhiye; Hou, Jie
- BMC Bioinformatics, Vol. 22, Issue 1
Improved protein structure refinement guided by deep learning based accuracy estimation
journal, February 2021
- Hiranuma, Naozumi; Park, Hahnbeom; Baek, Minkyung
- Nature Communications, Vol. 12, Issue 1
Deep Ranking in Template-free Protein Structure Prediction
conference, September 2020
- Chen, Xiao; Akhter, Nasrin; Guo, Zhiye
- BCB '20: 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics
Statistical potentials for fold assessment
journal, February 2002
- Melo, Francisco; Sánchez, Roberto; Sali, Andrej
- Protein Science, Vol. 11, Issue 2
Improved model quality assessment using ProQ2
journal, January 2012
- Ray, Arjun; Lindahl, Erik; Wallner, Björn
- BMC Bioinformatics, Vol. 13, Issue 1
lDDT: a local superposition-free score for comparing protein structures and models using distance difference tests
journal, August 2013
- Mariani, Valerio; Biasini, Marco; Barbato, Alessandro
- Bioinformatics, Vol. 29, Issue 21
ModFOLD6: an accurate web server for the global and local quality estimation of 3D protein models
journal, April 2017
- Maghrabi, Ali H. A.; McGuffin, Liam J.
- Nucleic Acids Research, Vol. 45, Issue W1
Fold assessment for comparative protein structure modeling
journal, November 2007
- Melo, Francisco; Sali, Andrej
- Protein Science, Vol. 16, Issue 11
Rapid model quality assessment for protein structure predictions using the comparison of multiple models without structural alignments
journal, November 2009
- McGuffin, L. J.; Roche, D. B.
- Bioinformatics, Vol. 26, Issue 2
Improved protein structure prediction using predicted interresidue orientations
journal, January 2020
- Yang, Jianyi; Anishchenko, Ivan; Park, Hahnbeom
- Proceedings of the National Academy of Sciences, Vol. 117, Issue 3
Identification of correct regions in protein models using structural, alignment, and consensus information
journal, March 2006
- Wallner, B.
- Protein Science, Vol. 15, Issue 4
Image Quality Metrics: PSNR vs. SSIM
conference, August 2010
- Hore, Alain; Ziou, Djemel
- 2010 20th International Conference on Pattern Recognition (ICPR)
Evaluating the absolute quality of a single protein model using structural features and support vector machines
journal, May 2009
- Wang, Zheng; Tegge, Allison N.; Cheng, Jianlin
- Proteins: Structure, Function, and Bioinformatics, Vol. 75, Issue 3
LGA: a method for finding 3D similarities in protein structures
journal, July 2003
- Zemla, A.
- Nucleic Acids Research, Vol. 31, Issue 13
QMEAN server for protein model quality estimation
journal, May 2009
- Benkert, Pascal; Künzli, Michael; Schwede, Torsten
- Nucleic Acids Research, Vol. 37, Issue suppl_2
DeepQA: improving the estimation of single protein model quality with deep belief networks
journal, December 2016
- Cao, Renzhi; Bhattacharya, Debswapna; Hou, Jie
- BMC Bioinformatics, Vol. 17, Issue 1
From local explanations to global understanding with explainable AI for trees
journal, January 2020
- Lundberg, Scott M.; Erion, Gabriel; Chen, Hugh
- Nature Machine Intelligence, Vol. 2, Issue 1
Pcons: A neural-network-based consensus predictor that improves fold recognition
journal, November 2001
- Lundström, Jesper; Rychlewski, Leszek; Bujnicki, Janusz
- Protein Science, Vol. 10, Issue 11
QMEAN: A comprehensive scoring function for model quality assessment
journal, April 2008
- Benkert, Pascal; Tosatto, Silvio C. E.; Schomburg, Dietmar
- Proteins: Structure, Function, and Bioinformatics, Vol. 71, Issue 1
ORB: An efficient alternative to SIFT or SURF
conference, November 2011
- Rublee, Ethan; Rabaud, Vincent; Konolige, Kurt
- 2011 IEEE International Conference on Computer Vision (ICCV), 2011 International Conference on Computer Vision
Improved protein model quality assessment by integrating sequential and pairwise features using deep learning
journal, December 2020
- Jing, Xiaoyang; Xu, Jinbo
- Bioinformatics, Vol. 36, Issue 22-23
Effects of amino acid composition, finite size of proteins, and sparse statistics on distance-dependent statistical pair potentials
journal, March 2007
- Rykunov, Dmitry; Fiser, András
- Proteins: Structure, Function, and Bioinformatics, Vol. 67, Issue 3
UCSF Chimera?A visualization system for exploratory research and analysis
journal, January 2004
- Pettersen, Eric F.; Goddard, Thomas D.; Huang, Conrad C.
- Journal of Computational Chemistry, Vol. 25, Issue 13
ProQ3: Improved model quality assessments using Rosetta energy terms
journal, October 2016
- Uziela, Karolis; Shu, Nanjiang; Wallner, Björn
- Scientific Reports, Vol. 6, Issue 1
The Protein Model Portal—a comprehensive resource for protein structure and model information
journal, January 2013
- Haas, Juergen; Roth, Steven; Arnold, Konstantin
- Database, Vol. 2013