Improving protein tertiary structure prediction by deep learning and distance prediction in CASP14
- Department of Electrical Engineering and Computer Science University of Missouri Columbia Missouri USA
- Department of Computer Science Saint Louis University St. Louis Missouri USA
Abstract Substantial progresses in protein structure prediction have been made by utilizing deep‐learning and residue‐residue distance prediction since CASP13. Inspired by the advances, we improve our CASP14 MULTICOM protein structure prediction system by incorporating three new components: (a) a new deep learning‐based protein inter‐residue distance predictor to improve template‐free (ab initio) tertiary structure prediction, (b) an enhanced template‐based tertiary structure prediction method, and (c) distance‐based model quality assessment methods empowered by deep learning. In the 2020 CASP14 experiment, MULTICOM predictor was ranked seventh out of 146 predictors in tertiary structure prediction and ranked third out of 136 predictors in inter‐domain structure prediction. The results demonstrate that the template‐free modeling based on deep learning and residue‐residue distance prediction can predict the correct topology for almost all template‐based modeling targets and a majority of hard targets (template‐free targets or targets whose templates cannot be recognized), which is a significant improvement over the CASP13 MULTICOM predictor. Moreover, the template‐free modeling performs better than the template‐based modeling on not only hard targets but also the targets that have homologous templates. The performance of the template‐free modeling largely depends on the accuracy of distance prediction closely related to the quality of multiple sequence alignments. The structural model quality assessment works well on targets for which enough good models can be predicted, but it may perform poorly when only a few good models are predicted for a hard target and the distribution of model quality scores is highly skewed. MULTICOM is available at https://github.com/jianlin-cheng/MULTICOM_Human_CASP14/tree/CASP14_DeepRank3 and https://github.com/multicom-toolbox/multicom/tree/multicom_v2.0 .
- Research Organization:
- Donald Danforth Plant Science Center, St. Louis, MO (United States); University of Missouri, Columbia, MO (United States)
- Sponsoring Organization:
- National Institutes of Health (NIH); National Science Foundation (NSF); USDOE; USDOE Office of Science (SC); USDOE Office of Science (SC), Biological and Environmental Research (BER)
- Grant/Contract Number:
- AC05-00OR22725; SC0020400; SC0021303
- OSTI ID:
- 1835382
- Journal Information:
- Proteins, Journal Name: Proteins Journal Issue: 1 Vol. 90; ISSN 0887-3585
- Publisher:
- Wiley Blackwell (John Wiley & Sons)Copyright Statement
- Country of Publication:
- United States
- Language:
- English
Similar Records
DISTEMA: distance map-based estimation of single protein model accuracy with attentive 2D convolutional neural network
Combination of deep neural network with attention mechanism enhances the explainability of protein contact prediction