DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: MULTICOM2 open-source protein structure prediction system powered by deep learning and distance prediction

Abstract

Protein structure prediction is an important problem in bioinformatics and has been studied for decades. However, there are still few open-source comprehensive protein structure prediction packages publicly available in the field. In this paper, we present our latest open-source protein tertiary structure prediction system—MULTICOM2, an integration of template-based modeling (TBM) and template-free modeling (FM) methods. The template-based modeling uses sequence alignment tools with deep multiple sequence alignments to search for structural templates, which are much faster and more accurate than MULTICOM1. The template-free (ab initio or de novo) modeling uses the inter-residue distances predicted by DeepDist to reconstruct tertiary structure models without using any known structure as template. In the blind CASP14 experiment, the average TM-score of the models predicted by our server predictor based on the MULTICOM2 system is 0.720 for 58 TBM (regular) domains and 0.514 for 38 FM and FM/TBM (hard) domains, indicating that MULTICOM2 is capable of predicting good tertiary structures across the board. It can predict the correct fold for 76 CASP14 domains (95% regular domains and 55% hard domains) if only one prediction is made for a domain. The success rate is increased to 3% for both regular and hard domains if five predictionsmore » are made per domain. Moreover, the prediction accuracy of the pure template-free structure modeling method on both TBM and FM targets is very close to the combination of template-based and template-free modeling methods. This demonstrates that the distance-based template-free modeling method powered by deep learning can largely replace the traditional template-based modeling method even on TBM targets that TBM methods used to dominate and therefore provides a uniform structure modeling approach to any protein. Finally, on the 38 CASP14 FM and FM/TBM hard domains, MULTICOM2 server predictors (MULTICOM-HYBRID, MULTICOM-DEEP, MULTICOM-DIST) were ranked among the top 20 automated server predictors in the CASP14 experiment. After combining multiple predictors from the same research group as one entry, MULTICOM-HYBRID was ranked no. 5. The source code of MULTICOM2 is freely available athttps://github.com/multicom-toolbox/multicom/tree/multicom_v2.0.« less

Authors:
 [1];  [1];  [1];  [2];  [1]
  1. Univ. of Missouri, Columbia, MO (United States). Dept. of Electrical Engineering and Computer Science
  2. St. Louis Univ., St. Louis, MO (United States)
Publication Date:
Research Org.:
Univ. of Missouri, Columbia, MO (United States); Donald Danforth Plant Science Center, St. Louis, MO (United States); Georgia Institute of Technology, Atlanta, GA (United States)
Sponsoring Org.:
USDOE Advanced Research Projects Agency - Energy (ARPA-E); USDOE Office of Science (SC), Biological and Environmental Research (BER)
OSTI Identifier:
1848382
Alternate Identifier(s):
OSTI ID: 2278963; OSTI ID: 2318540
Grant/Contract Number:  
AR0001213; SC0020400; SC0021303
Resource Type:
Accepted Manuscript
Journal Name:
Scientific Reports
Additional Journal Information:
Journal Volume: 11; Journal Issue: 1; Journal ID: ISSN 2045-2322
Publisher:
Nature Publishing Group
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; Science & Technology - Other Topics; Protein structure predictions; Software

Citation Formats

Wu, Tianqi, Liu, Jian, Guo, Zhiye, Hou, Jie, and Cheng, Jianlin. MULTICOM2 open-source protein structure prediction system powered by deep learning and distance prediction. United States: N. p., 2021. Web. doi:10.1038/s41598-021-92395-6.
Wu, Tianqi, Liu, Jian, Guo, Zhiye, Hou, Jie, & Cheng, Jianlin. MULTICOM2 open-source protein structure prediction system powered by deep learning and distance prediction. United States. https://doi.org/10.1038/s41598-021-92395-6
Wu, Tianqi, Liu, Jian, Guo, Zhiye, Hou, Jie, and Cheng, Jianlin. Wed . "MULTICOM2 open-source protein structure prediction system powered by deep learning and distance prediction". United States. https://doi.org/10.1038/s41598-021-92395-6. https://www.osti.gov/servlets/purl/1848382.
@article{osti_1848382,
title = {MULTICOM2 open-source protein structure prediction system powered by deep learning and distance prediction},
author = {Wu, Tianqi and Liu, Jian and Guo, Zhiye and Hou, Jie and Cheng, Jianlin},
abstractNote = {Protein structure prediction is an important problem in bioinformatics and has been studied for decades. However, there are still few open-source comprehensive protein structure prediction packages publicly available in the field. In this paper, we present our latest open-source protein tertiary structure prediction system—MULTICOM2, an integration of template-based modeling (TBM) and template-free modeling (FM) methods. The template-based modeling uses sequence alignment tools with deep multiple sequence alignments to search for structural templates, which are much faster and more accurate than MULTICOM1. The template-free (ab initio or de novo) modeling uses the inter-residue distances predicted by DeepDist to reconstruct tertiary structure models without using any known structure as template. In the blind CASP14 experiment, the average TM-score of the models predicted by our server predictor based on the MULTICOM2 system is 0.720 for 58 TBM (regular) domains and 0.514 for 38 FM and FM/TBM (hard) domains, indicating that MULTICOM2 is capable of predicting good tertiary structures across the board. It can predict the correct fold for 76 CASP14 domains (95% regular domains and 55% hard domains) if only one prediction is made for a domain. The success rate is increased to 3% for both regular and hard domains if five predictions are made per domain. Moreover, the prediction accuracy of the pure template-free structure modeling method on both TBM and FM targets is very close to the combination of template-based and template-free modeling methods. This demonstrates that the distance-based template-free modeling method powered by deep learning can largely replace the traditional template-based modeling method even on TBM targets that TBM methods used to dominate and therefore provides a uniform structure modeling approach to any protein. Finally, on the 38 CASP14 FM and FM/TBM hard domains, MULTICOM2 server predictors (MULTICOM-HYBRID, MULTICOM-DEEP, MULTICOM-DIST) were ranked among the top 20 automated server predictors in the CASP14 experiment. After combining multiple predictors from the same research group as one entry, MULTICOM-HYBRID was ranked no. 5. The source code of MULTICOM2 is freely available athttps://github.com/multicom-toolbox/multicom/tree/multicom_v2.0.},
doi = {10.1038/s41598-021-92395-6},
journal = {Scientific Reports},
number = 1,
volume = 11,
place = {United States},
year = {Wed Jun 23 00:00:00 EDT 2021},
month = {Wed Jun 23 00:00:00 EDT 2021}
}

Works referenced in this record:

CONFOLD2: improved contact-driven ab initio protein structure modeling
journal, January 2018


MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins
journal, November 2014


COMPASS: A Tool for Comparison of Multiple Protein Alignments with Assessment of Statistical Significance
journal, February 2003


Scoring function for automated assessment of protein structure template quality
journal, January 2004

  • Zhang, Yang; Skolnick, Jeffrey
  • Proteins: Structure, Function, and Bioinformatics, Vol. 57, Issue 4
  • DOI: 10.1002/prot.20264

Smooth orientation-dependent scoring function for coarse-grained protein quality assessment
journal, December 2018


APOLLO: a quality assessment service for single and multiple protein models
journal, May 2011


Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13
journal, April 2019

  • Hou, Jie; Wu, Tianqi; Cao, Renzhi
  • Proteins: Structure, Function, and Bioinformatics, Vol. 87, Issue 12
  • DOI: 10.1002/prot.25697

Improved protein structure prediction using potentials from deep learning
journal, January 2020


RaptorX server: A Resource for Template-Based Protein Structure Modeling
book, January 2014

  • Källberg, Morten; Margaryan, Gohar; Wang, Sheng
  • Protein Structure Prediction. Methods in Molecular Biology
  • DOI: 10.1007/978-1-4939-0366-5_2

Propedia: a database for protein–peptide identification based on a hybrid clustering algorithm
journal, January 2021


Distance-based protein folding powered by deep learning
journal, August 2019


I-TASSER: a unified platform for automated protein structure and function prediction
journal, March 2010

  • Roy, Ambrish; Kucukural, Alper; Zhang, Yang
  • Nature Protocols, Vol. 5, Issue 4
  • DOI: 10.1038/nprot.2010.5

Designing and evaluating the MULTICOM protein local and global model quality prediction methods in the CASP10 experiment
journal, April 2014


MUSTER: Improving protein sequence profile-profile alignments by using multiple sources of structure information
journal, February 2008

  • Wu, Sitao; Zhang, Yang
  • Proteins: Structure, Function, and Bioinformatics, Vol. 72, Issue 2
  • DOI: 10.1002/prot.21945

A Completely Reimplemented MPI Bioinformatics Toolkit with a New HHpred Server at its Core
journal, July 2018

  • Zimmermann, Lukas; Stephens, Andrew; Nam, Seung-Zin
  • Journal of Molecular Biology, Vol. 430, Issue 15
  • DOI: 10.1016/j.jmb.2017.12.007

DeepDist: real-value inter-residue distance prediction with deep residual convolutional network
journal, January 2021


CASP6 assessment of contact prediction
journal, January 2005

  • Graña, Osvaldo; Baker, David; MacCallum, Robert M.
  • Proteins: Structure, Function, and Bioinformatics, Vol. 61, Issue S7
  • DOI: 10.1002/prot.20739

MGnify: the microbiome analysis resource in 2020
journal, November 2019

  • Mitchell, Alex L.; Almeida, Alexandre; Beracochea, Martin
  • Nucleic Acids Research
  • DOI: 10.1093/nar/gkz1035

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
journal, September 1997

  • Altschul, Stephen F.; Madden, Thomas L.; Schäffer, Alejandro A.
  • Nucleic Acids Research, Vol. 25, Issue 17, p. 3389-3402
  • DOI: 10.1093/nar/25.17.3389

Version 1.2 of the Crystallography and NMR system
journal, October 2007


An automatic method for CASP9 free modeling structure prediction assessment
journal, October 2011


Evaluation of free modeling targets in CASP11 and ROLL: Targets in CASP11 and ROLL
journal, January 2016

  • Kinch, Lisa N.; Li, Wenlin; Monastyrskyy, Bohdan
  • Proteins: Structure, Function, and Bioinformatics, Vol. 84
  • DOI: 10.1002/prot.24973

Improved protein structure prediction using predicted interresidue orientations
journal, January 2020

  • Yang, Jianyi; Anishchenko, Ivan; Park, Hahnbeom
  • Proceedings of the National Academy of Sciences, Vol. 117, Issue 3
  • DOI: 10.1073/pnas.1914677117

FFAS-3D: improving fold recognition by including optimized structural features and template re-ranking
journal, October 2013


Deep learning extends de novo protein modelling coverage of genomes using iteratively predicted structural constraints
journal, September 2019


Profile Comparer: a program for scoring and aligning profile hidden Markov models
journal, October 2008


HMMER web server: interactive sequence similarity searching
journal, May 2011

  • Finn, R. D.; Clements, J.; Eddy, S. R.
  • Nucleic Acids Research, Vol. 39, Issue suppl
  • DOI: 10.1093/nar/gkr367

Uniclust databases of clustered and deeply annotated protein sequences and alignments
journal, November 2016

  • Mirdita, Milot; von den Driesch, Lars; Galiez, Clovis
  • Nucleic Acids Research, Vol. 45, Issue D1
  • DOI: 10.1093/nar/gkw1081

LGA: a method for finding 3D similarities in protein structures
journal, July 2003


AIDA: ab initio domain assembly server
journal, May 2014

  • Xu, Dong; Jaroszewski, Lukasz; Li, Zhanwen
  • Nucleic Acids Research, Vol. 42, Issue W1
  • DOI: 10.1093/nar/gku369

A multi-template combination algorithm for protein comparative modeling
journal, January 2008


HH-suite3 for fast remote homology detection and deep protein annotation
journal, September 2019