MULTICOM2 open-source protein structure prediction system powered by deep learning and distance prediction
Abstract
Protein structure prediction is an important problem in bioinformatics and has been studied for decades. However, there are still few open-source comprehensive protein structure prediction packages publicly available in the field. In this paper, we present our latest open-source protein tertiary structure prediction system—MULTICOM2, an integration of template-based modeling (TBM) and template-free modeling (FM) methods. The template-based modeling uses sequence alignment tools with deep multiple sequence alignments to search for structural templates, which are much faster and more accurate than MULTICOM1. The template-free (ab initio or de novo) modeling uses the inter-residue distances predicted by DeepDist to reconstruct tertiary structure models without using any known structure as template. In the blind CASP14 experiment, the average TM-score of the models predicted by our server predictor based on the MULTICOM2 system is 0.720 for 58 TBM (regular) domains and 0.514 for 38 FM and FM/TBM (hard) domains, indicating that MULTICOM2 is capable of predicting good tertiary structures across the board. It can predict the correct fold for 76 CASP14 domains (95% regular domains and 55% hard domains) if only one prediction is made for a domain. The success rate is increased to 3% for both regular and hard domains if five predictionsmore »
- Authors:
-
- Univ. of Missouri, Columbia, MO (United States). Dept. of Electrical Engineering and Computer Science
- St. Louis Univ., St. Louis, MO (United States)
- Publication Date:
- Research Org.:
- Univ. of Missouri, Columbia, MO (United States); Donald Danforth Plant Science Center, St. Louis, MO (United States); Georgia Institute of Technology, Atlanta, GA (United States)
- Sponsoring Org.:
- USDOE Advanced Research Projects Agency - Energy (ARPA-E); USDOE Office of Science (SC), Biological and Environmental Research (BER)
- OSTI Identifier:
- 1848382
- Alternate Identifier(s):
- OSTI ID: 2278963; OSTI ID: 2318540
- Grant/Contract Number:
- AR0001213; SC0020400; SC0021303
- Resource Type:
- Accepted Manuscript
- Journal Name:
- Scientific Reports
- Additional Journal Information:
- Journal Volume: 11; Journal Issue: 1; Journal ID: ISSN 2045-2322
- Publisher:
- Nature Publishing Group
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING; Science & Technology - Other Topics; Protein structure predictions; Software
Citation Formats
Wu, Tianqi, Liu, Jian, Guo, Zhiye, Hou, Jie, and Cheng, Jianlin. MULTICOM2 open-source protein structure prediction system powered by deep learning and distance prediction. United States: N. p., 2021.
Web. doi:10.1038/s41598-021-92395-6.
Wu, Tianqi, Liu, Jian, Guo, Zhiye, Hou, Jie, & Cheng, Jianlin. MULTICOM2 open-source protein structure prediction system powered by deep learning and distance prediction. United States. https://doi.org/10.1038/s41598-021-92395-6
Wu, Tianqi, Liu, Jian, Guo, Zhiye, Hou, Jie, and Cheng, Jianlin. Wed .
"MULTICOM2 open-source protein structure prediction system powered by deep learning and distance prediction". United States. https://doi.org/10.1038/s41598-021-92395-6. https://www.osti.gov/servlets/purl/1848382.
@article{osti_1848382,
title = {MULTICOM2 open-source protein structure prediction system powered by deep learning and distance prediction},
author = {Wu, Tianqi and Liu, Jian and Guo, Zhiye and Hou, Jie and Cheng, Jianlin},
abstractNote = {Protein structure prediction is an important problem in bioinformatics and has been studied for decades. However, there are still few open-source comprehensive protein structure prediction packages publicly available in the field. In this paper, we present our latest open-source protein tertiary structure prediction system—MULTICOM2, an integration of template-based modeling (TBM) and template-free modeling (FM) methods. The template-based modeling uses sequence alignment tools with deep multiple sequence alignments to search for structural templates, which are much faster and more accurate than MULTICOM1. The template-free (ab initio or de novo) modeling uses the inter-residue distances predicted by DeepDist to reconstruct tertiary structure models without using any known structure as template. In the blind CASP14 experiment, the average TM-score of the models predicted by our server predictor based on the MULTICOM2 system is 0.720 for 58 TBM (regular) domains and 0.514 for 38 FM and FM/TBM (hard) domains, indicating that MULTICOM2 is capable of predicting good tertiary structures across the board. It can predict the correct fold for 76 CASP14 domains (95% regular domains and 55% hard domains) if only one prediction is made for a domain. The success rate is increased to 3% for both regular and hard domains if five predictions are made per domain. Moreover, the prediction accuracy of the pure template-free structure modeling method on both TBM and FM targets is very close to the combination of template-based and template-free modeling methods. This demonstrates that the distance-based template-free modeling method powered by deep learning can largely replace the traditional template-based modeling method even on TBM targets that TBM methods used to dominate and therefore provides a uniform structure modeling approach to any protein. Finally, on the 38 CASP14 FM and FM/TBM hard domains, MULTICOM2 server predictors (MULTICOM-HYBRID, MULTICOM-DEEP, MULTICOM-DIST) were ranked among the top 20 automated server predictors in the CASP14 experiment. After combining multiple predictors from the same research group as one entry, MULTICOM-HYBRID was ranked no. 5. The source code of MULTICOM2 is freely available athttps://github.com/multicom-toolbox/multicom/tree/multicom_v2.0.},
doi = {10.1038/s41598-021-92395-6},
journal = {Scientific Reports},
number = 1,
volume = 11,
place = {United States},
year = {Wed Jun 23 00:00:00 EDT 2021},
month = {Wed Jun 23 00:00:00 EDT 2021}
}
Works referenced in this record:
CONFOLD2: improved contact-driven ab initio protein structure modeling
journal, January 2018
- Adhikari, Badri; Cheng, Jianlin
- BMC Bioinformatics, Vol. 19, Issue 1
MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins
journal, November 2014
- Jones, David T.; Singh, Tanya; Kosciolek, Tomasz
- Bioinformatics, Vol. 31, Issue 7
COMPASS: A Tool for Comparison of Multiple Protein Alignments with Assessment of Statistical Significance
journal, February 2003
- Sadreyev, Ruslan; Grishin, Nick
- Journal of Molecular Biology, Vol. 326, Issue 1
DeepMSA: constructing deep multiple sequence alignment to improve contact prediction and fold-recognition for distant-homology proteins
journal, November 2019
- Zhang, Chengxin; Zheng, Wei; Mortuza, S. M.
- Bioinformatics, Vol. 36, Issue 7
Scoring function for automated assessment of protein structure template quality
journal, January 2004
- Zhang, Yang; Skolnick, Jeffrey
- Proteins: Structure, Function, and Bioinformatics, Vol. 57, Issue 4
Smooth orientation-dependent scoring function for coarse-grained protein quality assessment
journal, December 2018
- Karasikov, Mikhail; Pagès, Guillaume; Grudinin, Sergei
- Bioinformatics, Vol. 35, Issue 16
APOLLO: a quality assessment service for single and multiple protein models
journal, May 2011
- Wang, Z.; Eickholt, J.; Cheng, J.
- Bioinformatics, Vol. 27, Issue 12
Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13
journal, April 2019
- Hou, Jie; Wu, Tianqi; Cao, Renzhi
- Proteins: Structure, Function, and Bioinformatics, Vol. 87, Issue 12
Improved protein structure prediction using potentials from deep learning
journal, January 2020
- Senior, Andrew W.; Evans, Richard; Jumper, John
- Nature, Vol. 577, Issue 7792
RaptorX server: A Resource for Template-Based Protein Structure Modeling
book, January 2014
- Källberg, Morten; Margaryan, Gohar; Wang, Sheng
- Protein Structure Prediction. Methods in Molecular Biology
Propedia: a database for protein–peptide identification based on a hybrid clustering algorithm
journal, January 2021
- Martins, Pedro M.; Santos, Lucianna H.; Mariano, Diego
- BMC Bioinformatics, Vol. 22, Issue 1
Distance-based protein folding powered by deep learning
journal, August 2019
- Xu, Jinbo
- Proceedings of the National Academy of Sciences, Vol. 116, Issue 34
I-TASSER: a unified platform for automated protein structure and function prediction
journal, March 2010
- Roy, Ambrish; Kucukural, Alper; Zhang, Yang
- Nature Protocols, Vol. 5, Issue 4
Designing and evaluating the MULTICOM protein local and global model quality prediction methods in the CASP10 experiment
journal, April 2014
- Cao, Renzhi; Wang, Zheng; Cheng, Jianlin
- BMC Structural Biology, Vol. 14, Issue 1
MUSTER: Improving protein sequence profile-profile alignments by using multiple sources of structure information
journal, February 2008
- Wu, Sitao; Zhang, Yang
- Proteins: Structure, Function, and Bioinformatics, Vol. 72, Issue 2
A Completely Reimplemented MPI Bioinformatics Toolkit with a New HHpred Server at its Core
journal, July 2018
- Zimmermann, Lukas; Stephens, Andrew; Nam, Seung-Zin
- Journal of Molecular Biology, Vol. 430, Issue 15
DeepDist: real-value inter-residue distance prediction with deep residual convolutional network
journal, January 2021
- Wu, Tianqi; Guo, Zhiye; Hou, Jie
- BMC Bioinformatics, Vol. 22, Issue 1
CASP6 assessment of contact prediction
journal, January 2005
- Graña, Osvaldo; Baker, David; MacCallum, Robert M.
- Proteins: Structure, Function, and Bioinformatics, Vol. 61, Issue S7
MGnify: the microbiome analysis resource in 2020
journal, November 2019
- Mitchell, Alex L.; Almeida, Alexandre; Beracochea, Martin
- Nucleic Acids Research
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
journal, September 1997
- Altschul, Stephen F.; Madden, Thomas L.; Schäffer, Alejandro A.
- Nucleic Acids Research, Vol. 25, Issue 17, p. 3389-3402
Version 1.2 of the Crystallography and NMR system
journal, October 2007
- Brunger, Axel T.
- Nature Protocols, Vol. 2, Issue 11
An automatic method for CASP9 free modeling structure prediction assessment
journal, October 2011
- Cong, Qian; Kinch, Lisa N.; Pei, Jimin
- Bioinformatics, Vol. 27, Issue 24
Evaluation of free modeling targets in CASP11 and ROLL: Targets in CASP11 and ROLL
journal, January 2016
- Kinch, Lisa N.; Li, Wenlin; Monastyrskyy, Bohdan
- Proteins: Structure, Function, and Bioinformatics, Vol. 84
Improved protein structure prediction using predicted interresidue orientations
journal, January 2020
- Yang, Jianyi; Anishchenko, Ivan; Park, Hahnbeom
- Proceedings of the National Academy of Sciences, Vol. 117, Issue 3
FFAS-3D: improving fold recognition by including optimized structural features and template re-ranking
journal, October 2013
- Xu, Dong; Jaroszewski, Lukasz; Li, Zhanwen
- Bioinformatics, Vol. 30, Issue 5
Deep learning extends de novo protein modelling coverage of genomes using iteratively predicted structural constraints
journal, September 2019
- Greener, Joe G.; Kandathil, Shaun M.; Jones, David T.
- Nature Communications, Vol. 10, Issue 1
Profile Comparer: a program for scoring and aligning profile hidden Markov models
journal, October 2008
- Madera, M.
- Bioinformatics, Vol. 24, Issue 22
HMMER web server: interactive sequence similarity searching
journal, May 2011
- Finn, R. D.; Clements, J.; Eddy, S. R.
- Nucleic Acids Research, Vol. 39, Issue suppl
Uniclust databases of clustered and deeply annotated protein sequences and alignments
journal, November 2016
- Mirdita, Milot; von den Driesch, Lars; Galiez, Clovis
- Nucleic Acids Research, Vol. 45, Issue D1
LGA: a method for finding 3D similarities in protein structures
journal, July 2003
- Zemla, A.
- Nucleic Acids Research, Vol. 31, Issue 13
AIDA: ab initio domain assembly server
journal, May 2014
- Xu, Dong; Jaroszewski, Lukasz; Li, Zhanwen
- Nucleic Acids Research, Vol. 42, Issue W1
A multi-template combination algorithm for protein comparative modeling
journal, January 2008
- Cheng, Jianlin
- BMC Structural Biology, Vol. 8, Issue 1
HH-suite3 for fast remote homology detection and deep protein annotation
journal, September 2019
- Steinegger, Martin; Meier, Markus; Mirdita, Milot
- BMC Bioinformatics, Vol. 20, Issue 1