DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Improving protein tertiary structure prediction by deep learning and distance prediction in CASP14

Journal Article · · Proteins
DOI: https://doi.org/10.1002/prot.26186 · OSTI ID:1835382
 [1];  [1];  [1];  [2]; ORCiD logo [1]
  1. Department of Electrical Engineering and Computer Science University of Missouri Columbia Missouri USA
  2. Department of Computer Science Saint Louis University St. Louis Missouri USA

Abstract Substantial progresses in protein structure prediction have been made by utilizing deep‐learning and residue‐residue distance prediction since CASP13. Inspired by the advances, we improve our CASP14 MULTICOM protein structure prediction system by incorporating three new components: (a) a new deep learning‐based protein inter‐residue distance predictor to improve template‐free (ab initio) tertiary structure prediction, (b) an enhanced template‐based tertiary structure prediction method, and (c) distance‐based model quality assessment methods empowered by deep learning. In the 2020 CASP14 experiment, MULTICOM predictor was ranked seventh out of 146 predictors in tertiary structure prediction and ranked third out of 136 predictors in inter‐domain structure prediction. The results demonstrate that the template‐free modeling based on deep learning and residue‐residue distance prediction can predict the correct topology for almost all template‐based modeling targets and a majority of hard targets (template‐free targets or targets whose templates cannot be recognized), which is a significant improvement over the CASP13 MULTICOM predictor. Moreover, the template‐free modeling performs better than the template‐based modeling on not only hard targets but also the targets that have homologous templates. The performance of the template‐free modeling largely depends on the accuracy of distance prediction closely related to the quality of multiple sequence alignments. The structural model quality assessment works well on targets for which enough good models can be predicted, but it may perform poorly when only a few good models are predicted for a hard target and the distribution of model quality scores is highly skewed. MULTICOM is available at https://github.com/jianlin-cheng/MULTICOM_Human_CASP14/tree/CASP14_DeepRank3 and https://github.com/multicom-toolbox/multicom/tree/multicom_v2.0 .

Research Organization:
Donald Danforth Plant Science Center, St. Louis, MO (United States); University of Missouri, Columbia, MO (United States)
Sponsoring Organization:
National Institutes of Health (NIH); National Science Foundation (NSF); USDOE; USDOE Office of Science (SC); USDOE Office of Science (SC), Biological and Environmental Research (BER)
Grant/Contract Number:
AC05-00OR22725; SC0020400; SC0021303
OSTI ID:
1835382
Journal Information:
Proteins, Journal Name: Proteins Journal Issue: 1 Vol. 90; ISSN 0887-3585
Publisher:
Wiley Blackwell (John Wiley & Sons)Copyright Statement
Country of Publication:
United States
Language:
English

References (46)

Fast procedure for reconstruction of full-atom protein models from reduced representations journal January 2008
Voronota: A fast and reliable tool for computing the vertices of the Voronoi diagram of atomic balls journal February 2014
Effects of amino acid composition, finite size of proteins, and sparse statistics on distance-dependent statistical pair potentials journal March 2007
MUSTER: Improving protein sequence profile-profile alignments by using multiple sources of structure information journal February 2008
CONFOLD: Residue-residue contact-guided ab initio protein folding: Contact-Guided Protein Folding journal June 2015
Critical assessment of methods of protein structure prediction (CASP)-Round XII journal December 2017
Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13 journal April 2019
Deep‐learning contact‐map guided protein structure prediction in CASP13 journal August 2019
Analysis of distance‐based protein structure prediction by deep learning in CASP13 journal August 2019
Prediction of protein side-chain rotamers from a backbone-dependent rotamer library: a new homology modeling tool journal April 1997
Protein Structure Modeling with MODELLER book January 2014
RaptorX server: A Resource for Template-Based Protein Structure Modeling book January 2014
Basic local alignment search tool journal October 1990
Protein Structure Prediction Using Rosetta book January 2004
Improving the Physical Realism and Structural Accuracy of Protein Models by a Two-Step Atomic-Level Energy Minimization journal November 2011
Folding Membrane Proteins by Deep Transfer Learning journal September 2017
OPUS-PSP: An Orientation-dependent Statistical All-atom Potential Derived from Side-chain Packing journal February 2008
Framing pictures: The role of knowledge in automatized encoding and memory for gist. journal January 1979
HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment journal December 2011
The I-TASSER Suite: protein structure and function prediction journal December 2014
Template-based protein structure modeling using the RaptorX web server journal July 2012
Clustering huge protein sequence sets in linear time journal June 2018
Improved protein structure prediction using potentials from deep learning journal January 2020
Protein-level assembly increases protein sequence recovery from metagenomic samples manyfold journal June 2019
A Stochastic Point Cloud Sampling Method for Multi-Template Protein Comparative Modeling journal May 2016
ProQ3: Improved model quality assessments using Rosetta energy terms journal October 2016
Improved protein structure prediction using predicted interresidue orientations journal January 2020
Improving deep learning-based protein distance prediction in CASP14 journal May 2021
Protein homology detection by HMM-HMM comparison journal November 2004
Profile Comparer: a program for scoring and aligning profile hidden Markov models journal October 2008
Rapid model quality assessment for protein structure predictions using the comparison of multiple models without structural alignments journal November 2009
APOLLO: a quality assessment service for single and multiple protein models journal May 2011
DNCON2: improved protein contact prediction using two-level deep convolutional neural networks journal December 2017
Smooth orientation-dependent scoring function for coarse-grained protein quality assessment journal December 2018
DeepMSA: constructing deep multiple sequence alignment to improve contact prediction and fold-recognition for distant-homology proteins journal November 2019
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs journal September 1997
HMMER web server: interactive sequence similarity searching journal May 2011
AIDA: ab initio domain assembly server journal May 2014
3Drefine: an interactive web server for efficient protein structure refinement journal April 2016
ORB: An efficient alternative to SIFT or SURF conference November 2011
Image Quality Metrics: PSNR vs. SSIM conference August 2010
Statistical potential for assessment and prediction of protein structures journal November 2006
Pcons: A neural-network-based consensus predictor that improves fold recognition journal November 2001
Improved model quality assessment using ProQ2 journal January 2012
DeepQA: improving the estimation of single protein model quality with deep belief networks journal December 2016
A Novel Side-Chain Orientation Dependent Potential Derived from Random-Walk Reference State for Protein Fold Selection and Structure Prediction journal October 2010