DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Protein model accuracy estimation empowered by deep learning and inter-residue distance prediction in CASP14

Abstract

The inter-residue contact prediction and deep learning showed the promise to improve the estimation of protein model accuracy (EMA) in the 13th Critical Assessment of Protein Structure Prediction (CASP13). To further leverage the improved inter-residue distance predictions to enhance EMA, during the 2020 CASP14 experiment, we integrated several new inter-residue distance features with the existing model quality assessment features in several deep learning methods to predict the quality of protein structural models. According to the evaluation of performance in selecting the best model from the models of CASP14 targets, our three multi-model predictors of estimating model accuracy (MULTICOM-CONSTRUCT, MULTICOM-AI, and MULTICOM-CLUSTER) achieve the averaged loss of 0.073, 0.079, and 0.081, respectively, in terms of the global distance test score (GDT-TS). The three methods are ranked first, second, and third out of all 68 CASP14 predictors. MULTICOM-DEEP, the single-model predictor of estimating model accuracy (EMA), is ranked within top 10 among all the single-model EMA methods according to GDT-TS score loss. The results demonstrate that inter-residue distance features are valuable inputs for deep learning to predict the quality of protein structural models. However, larger training datasets and better ways of leveraging inter-residue distance information are needed to fully explore its potentials.

Authors:
; ; ; ; ;
Publication Date:
Research Org.:
Georgia Inst. of Technology, Atlanta, GA (United States). Georgia Tech Research Institute; Donald Danforth Plant Science Center, St. Louis, MO (United States)
Sponsoring Org.:
National Science Foundation (NSF); National Institutes of Health (NIH); USDOE Office of Science (SC), Biological and Environmental Research (BER)
OSTI Identifier:
1784581
Alternate Identifier(s):
OSTI ID: 1816544; OSTI ID: 1839182; OSTI ID: 2278965; OSTI ID: 2318539
Grant/Contract Number:  
SC0020400; SC0021303; DBI 1759934; IIS1763246; GM093123
Resource Type:
Published Article
Journal Name:
Scientific Reports
Additional Journal Information:
Journal Name: Scientific Reports Journal Volume: 11 Journal Issue: 1; Journal ID: ISSN 2045-2322
Publisher:
Nature Publishing Group
Country of Publication:
United Kingdom
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES

Citation Formats

Chen, Xiao, Liu, Jian, Guo, Zhiye, Wu, Tianqi, Hou, Jie, and Cheng, Jianlin. Protein model accuracy estimation empowered by deep learning and inter-residue distance prediction in CASP14. United Kingdom: N. p., 2021. Web. doi:10.1038/s41598-021-90303-6.
Chen, Xiao, Liu, Jian, Guo, Zhiye, Wu, Tianqi, Hou, Jie, & Cheng, Jianlin. Protein model accuracy estimation empowered by deep learning and inter-residue distance prediction in CASP14. United Kingdom. https://doi.org/10.1038/s41598-021-90303-6
Chen, Xiao, Liu, Jian, Guo, Zhiye, Wu, Tianqi, Hou, Jie, and Cheng, Jianlin. Tue . "Protein model accuracy estimation empowered by deep learning and inter-residue distance prediction in CASP14". United Kingdom. https://doi.org/10.1038/s41598-021-90303-6.
@article{osti_1784581,
title = {Protein model accuracy estimation empowered by deep learning and inter-residue distance prediction in CASP14},
author = {Chen, Xiao and Liu, Jian and Guo, Zhiye and Wu, Tianqi and Hou, Jie and Cheng, Jianlin},
abstractNote = {The inter-residue contact prediction and deep learning showed the promise to improve the estimation of protein model accuracy (EMA) in the 13th Critical Assessment of Protein Structure Prediction (CASP13). To further leverage the improved inter-residue distance predictions to enhance EMA, during the 2020 CASP14 experiment, we integrated several new inter-residue distance features with the existing model quality assessment features in several deep learning methods to predict the quality of protein structural models. According to the evaluation of performance in selecting the best model from the models of CASP14 targets, our three multi-model predictors of estimating model accuracy (MULTICOM-CONSTRUCT, MULTICOM-AI, and MULTICOM-CLUSTER) achieve the averaged loss of 0.073, 0.079, and 0.081, respectively, in terms of the global distance test score (GDT-TS). The three methods are ranked first, second, and third out of all 68 CASP14 predictors. MULTICOM-DEEP, the single-model predictor of estimating model accuracy (EMA), is ranked within top 10 among all the single-model EMA methods according to GDT-TS score loss. The results demonstrate that inter-residue distance features are valuable inputs for deep learning to predict the quality of protein structural models. However, larger training datasets and better ways of leveraging inter-residue distance information are needed to fully explore its potentials.},
doi = {10.1038/s41598-021-90303-6},
journal = {Scientific Reports},
number = 1,
volume = 11,
place = {United Kingdom},
year = {Tue May 25 00:00:00 EDT 2021},
month = {Tue May 25 00:00:00 EDT 2021}
}

Works referenced in this record:

Improvement of 3D protein models using multiple templates guided by single-template model quality assessment
journal, May 2012


Assessment of model accuracy estimations in CASP12
journal, September 2017

  • Kryshtafovych, Andriy; Monastyrskyy, Bohdan; Fidelis, Krzysztof
  • Proteins: Structure, Function, and Bioinformatics, Vol. 86
  • DOI: 10.1002/prot.25371

DNCON2: improved protein contact prediction using two-level deep convolutional neural networks
journal, December 2017


Voronota: A fast and reliable tool for computing the vertices of the Voronoi diagram of atomic balls
journal, February 2014

  • Olechnovič, Kliment; Venclovas, Česlovas
  • Journal of Computational Chemistry, Vol. 35, Issue 8
  • DOI: 10.1002/jcc.23538

CATH: an expanded resource to predict protein function through structure and sequence
journal, November 2016

  • Dawson, Natalie L.; Lewis, Tony E.; Das, Sayoni
  • Nucleic Acids Research, Vol. 45, Issue D1
  • DOI: 10.1093/nar/gkw1098

QDeep: distance-based protein model quality estimation by residue-level ensemble error classifications using stacked deep residual neural networks
journal, July 2020


Smooth orientation-dependent scoring function for coarse-grained protein quality assessment
journal, December 2018


APOLLO: a quality assessment service for single and multiple protein models
journal, May 2011


Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13
journal, April 2019

  • Hou, Jie; Wu, Tianqi; Cao, Renzhi
  • Proteins: Structure, Function, and Bioinformatics, Vol. 87, Issue 12
  • DOI: 10.1002/prot.25697

Specific interactions for ab initio folding of protein terminal regions with secondary structures
journal, February 2008

  • Yang, Yuedong; Zhou, Yaoqi
  • Proteins: Structure, Function, and Bioinformatics, Vol. 72, Issue 2
  • DOI: 10.1002/prot.21968

OPUS-PSP: An Orientation-dependent Statistical All-atom Potential Derived from Side-chain Packing
journal, February 2008

  • Lu, Mingyang; Dousis, Athanasios D.; Ma, Jianpeng
  • Journal of Molecular Biology, Vol. 376, Issue 1
  • DOI: 10.1016/j.jmb.2007.11.033

Assessment of protein model structure accuracy estimation in CASP13: Challenges in the era of deep learning
journal, August 2019

  • Won, Jonghun; Baek, Minkyung; Monastyrskyy, Bohdan
  • Proteins: Structure, Function, and Bioinformatics, Vol. 87, Issue 12
  • DOI: 10.1002/prot.25804

Statistical potential for assessment and prediction of protein structures
journal, November 2006


Ultrastructural Characterization of the Lower Motor System in a Mouse Model of Krabbe Disease
journal, December 2016


Estimation of model accuracy in CASP13
journal, July 2019

  • Cheng, Jianlin; Choe, Myong‐Ho; Elofsson, Arne
  • Proteins: Structure, Function, and Bioinformatics, Vol. 87, Issue 12
  • DOI: 10.1002/prot.25767

DeepDist: real-value inter-residue distance prediction with deep residual convolutional network
journal, January 2021


Improved protein structure refinement guided by deep learning based accuracy estimation
journal, February 2021


Deep Ranking in Template-free Protein Structure Prediction
conference, September 2020

  • Chen, Xiao; Akhter, Nasrin; Guo, Zhiye
  • BCB '20: 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics
  • DOI: 10.1145/3388440.3412469

Statistical potentials for fold assessment
journal, February 2002

  • Melo, Francisco; Sánchez, Roberto; Sali, Andrej
  • Protein Science, Vol. 11, Issue 2
  • DOI: 10.1002/pro.110430

Improved model quality assessment using ProQ2
journal, January 2012


lDDT: a local superposition-free score for comparing protein structures and models using distance difference tests
journal, August 2013


ModFOLD6: an accurate web server for the global and local quality estimation of 3D protein models
journal, April 2017

  • Maghrabi, Ali H. A.; McGuffin, Liam J.
  • Nucleic Acids Research, Vol. 45, Issue W1
  • DOI: 10.1093/nar/gkx332

Fold assessment for comparative protein structure modeling
journal, November 2007


Improved protein structure prediction using predicted interresidue orientations
journal, January 2020

  • Yang, Jianyi; Anishchenko, Ivan; Park, Hahnbeom
  • Proceedings of the National Academy of Sciences, Vol. 117, Issue 3
  • DOI: 10.1073/pnas.1914677117

Image Quality Metrics: PSNR vs. SSIM
conference, August 2010

  • Hore, Alain; Ziou, Djemel
  • 2010 20th International Conference on Pattern Recognition (ICPR)
  • DOI: 10.1109/ICPR.2010.579

Evaluating the absolute quality of a single protein model using structural features and support vector machines
journal, May 2009

  • Wang, Zheng; Tegge, Allison N.; Cheng, Jianlin
  • Proteins: Structure, Function, and Bioinformatics, Vol. 75, Issue 3
  • DOI: 10.1002/prot.22275

LGA: a method for finding 3D similarities in protein structures
journal, July 2003


QMEAN server for protein model quality estimation
journal, May 2009

  • Benkert, Pascal; Künzli, Michael; Schwede, Torsten
  • Nucleic Acids Research, Vol. 37, Issue suppl_2
  • DOI: 10.1093/nar/gkp322

The Protein Data Bank
journal, January 2000


DeepQA: improving the estimation of single protein model quality with deep belief networks
journal, December 2016


From local explanations to global understanding with explainable AI for trees
journal, January 2020


Pcons: A neural-network-based consensus predictor that improves fold recognition
journal, November 2001

  • Lundström, Jesper; Rychlewski, Leszek; Bujnicki, Janusz
  • Protein Science, Vol. 10, Issue 11
  • DOI: 10.1110/ps.08501

QMEAN: A comprehensive scoring function for model quality assessment
journal, April 2008

  • Benkert, Pascal; Tosatto, Silvio C. E.; Schomburg, Dietmar
  • Proteins: Structure, Function, and Bioinformatics, Vol. 71, Issue 1
  • DOI: 10.1002/prot.21715

ORB: An efficient alternative to SIFT or SURF
conference, November 2011

  • Rublee, Ethan; Rabaud, Vincent; Konolige, Kurt
  • 2011 IEEE International Conference on Computer Vision (ICCV), 2011 International Conference on Computer Vision
  • DOI: 10.1109/ICCV.2011.6126544

Improved protein model quality assessment by integrating sequential and pairwise features using deep learning
journal, December 2020


Effects of amino acid composition, finite size of proteins, and sparse statistics on distance-dependent statistical pair potentials
journal, March 2007

  • Rykunov, Dmitry; Fiser, András
  • Proteins: Structure, Function, and Bioinformatics, Vol. 67, Issue 3
  • DOI: 10.1002/prot.21279

UCSF Chimera?A visualization system for exploratory research and analysis
journal, January 2004

  • Pettersen, Eric F.; Goddard, Thomas D.; Huang, Conrad C.
  • Journal of Computational Chemistry, Vol. 25, Issue 13
  • DOI: 10.1002/jcc.20084

ProQ3: Improved model quality assessments using Rosetta energy terms
journal, October 2016

  • Uziela, Karolis; Shu, Nanjiang; Wallner, Björn
  • Scientific Reports, Vol. 6, Issue 1
  • DOI: 10.1038/srep33509

The Protein Model Portal—a comprehensive resource for protein structure and model information
journal, January 2013