Complete fold annotation of the human proteome using a novel structural feature space
Abstract
Recognition of protein structural fold is the starting point for many structure prediction tools and protein function inference. Fold prediction is computationally demanding and recognizing novel folds is difficult such that the majority of proteins have not been annotated for fold classification. Here we describe a new machine learning approach using a novel feature space that can be used for accurate recognition of all 1,221 currently known folds and inference of unknown novel folds. We show that our method achieves better than 94% accuracy even when many folds have only one training example. We demonstrate the utility of this method by predicting the folds of 34,330 human protein domains and showing that these predictions can yield useful insights into potential biological function, such as prediction of RNA-binding ability. Finally, our method can be applied to de novo fold prediction of entire proteomes and identify candidate novel fold families.
- Authors:
-
- Univ. of Pennsylvania, Philadelphia, PA (United States). Genomics and Computational Biology Program
- Univ. of Pennsylvania, Philadelphia, PA (United States). Dept. of Computer Science
- Univ. of Pennsylvania, Philadelphia, PA (United States). Genomics and Computational Biology Program; Univ. of Pennsylvania, Philadelphia, PA (United States). Dept. of Biology
- Publication Date:
- Research Org.:
- Krell Institute, Ames, IA (United States)
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 1366516
- Grant/Contract Number:
- FG02-97ER25308
- Resource Type:
- Accepted Manuscript
- Journal Name:
- Scientific Reports
- Additional Journal Information:
- Journal Volume: 7; Journal ID: ISSN 2045-2322
- Publisher:
- Nature Publishing Group
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 59 BASIC BIOLOGICAL SCIENCES
Citation Formats
Middleton, Sarah A., Illuminati, Joseph, and Kim, Junhyong. Complete fold annotation of the human proteome using a novel structural feature space. United States: N. p., 2017.
Web. doi:10.1038/srep46321.
Middleton, Sarah A., Illuminati, Joseph, & Kim, Junhyong. Complete fold annotation of the human proteome using a novel structural feature space. United States. https://doi.org/10.1038/srep46321
Middleton, Sarah A., Illuminati, Joseph, and Kim, Junhyong. Thu .
"Complete fold annotation of the human proteome using a novel structural feature space". United States. https://doi.org/10.1038/srep46321. https://www.osti.gov/servlets/purl/1366516.
@article{osti_1366516,
title = {Complete fold annotation of the human proteome using a novel structural feature space},
author = {Middleton, Sarah A. and Illuminati, Joseph and Kim, Junhyong},
abstractNote = {Recognition of protein structural fold is the starting point for many structure prediction tools and protein function inference. Fold prediction is computationally demanding and recognizing novel folds is difficult such that the majority of proteins have not been annotated for fold classification. Here we describe a new machine learning approach using a novel feature space that can be used for accurate recognition of all 1,221 currently known folds and inference of unknown novel folds. We show that our method achieves better than 94% accuracy even when many folds have only one training example. We demonstrate the utility of this method by predicting the folds of 34,330 human protein domains and showing that these predictions can yield useful insights into potential biological function, such as prediction of RNA-binding ability. Finally, our method can be applied to de novo fold prediction of entire proteomes and identify candidate novel fold families.},
doi = {10.1038/srep46321},
journal = {Scientific Reports},
number = ,
volume = 7,
place = {United States},
year = {Thu Apr 13 00:00:00 EDT 2017},
month = {Thu Apr 13 00:00:00 EDT 2017}
}
Web of Science
Works referenced in this record:
Deep learning
journal, May 2015
- LeCun, Yann; Bengio, Yoshua; Hinton, Geoffrey
- Nature, Vol. 521, Issue 7553
Probabilistic expression of spatially varied amino acid dimers into general form of Chou׳s pseudo amino acid composition for protein fold recognition
journal, September 2015
- Saini, Harsh; Raicar, Gaurav; Sharma, Alok
- Journal of Theoretical Biology, Vol. 380
A conditional neural fields model for protein threading
journal, June 2012
- Ma, Jianzhu; Peng, Jian; Wang, Sheng
- Bioinformatics, Vol. 28, Issue 12
Assessment of template-based protein structure predictions in CASP10: CASP10 TBM Assessment
journal, January 2014
- Huang, Yuanpeng J.; Mao, Binchen; Aramini, James M.
- Proteins: Structure, Function, and Bioinformatics, Vol. 82
Protein fold recognition using geometric kernel data fusion
journal, March 2014
- Zakeri, Pooya; Jeuris, Ben; Vandebril, Raf
- Bioinformatics, Vol. 30, Issue 13
Structural Genomics of Minimal Organisms and Protein Fold Space
journal, September 2005
- Kim, Sung-Hou; Shin, Dong Hae; Liu, Jinyu
- Journal of Structural and Functional Genomics, Vol. 6, Issue 2-3
Superfamily Assignments for the Yeast Proteome through Integration of Structure Prediction with the Gene Ontology
journal, March 2007
- Malmström, Lars; Riffle, Michael; Strauss, Charlie E. M.
- PLoS Biology, Vol. 5, Issue 4
Improving Protein Fold Recognition by Deep Learning Networks
journal, December 2015
- Jo, Taeho; Hou, Jie; Eickholt, Jesse
- Scientific Reports, Vol. 5, Issue 1
Template-based protein structure modeling using the RaptorX web server
journal, July 2012
- Källberg, Morten; Wang, Haipeng; Wang, Sheng
- Nature Protocols, Vol. 7, Issue 8
I-TASSER: a unified platform for automated protein structure and function prediction
journal, March 2010
- Roy, Ambrish; Kucukural, Alper; Zhang, Yang
- Nature Protocols, Vol. 5, Issue 4
A machine learning information retrieval approach to protein fold recognition
journal, March 2006
- Cheng, J.; Baldi, P.
- Bioinformatics, Vol. 22, Issue 12
Input space versus feature space in kernel-based methods
journal, January 1999
- Scholkopf, B.; Mika, S.; Burges, C. J. C.
- IEEE Transactions on Neural Networks, Vol. 10, Issue 5
The Proteome Folding Project: Proteome-scale prediction of structure and function
journal, August 2011
- Drew, K.; Winters, P.; Butterfoss, G. L.
- Genome Research, Vol. 21, Issue 11
BLAST+: architecture and applications
journal, January 2009
- Camacho, Christiam; Coulouris, George; Avagyan, Vahram
- BMC Bioinformatics, Vol. 10, Issue 1
The structure of the protein universe and genome evolution
journal, November 2002
- Koonin, Eugene V.; Wolf, Yuri I.; Karev, Georgy P.
- Nature, Vol. 420, Issue 6912
A new gene, EVC2, is mutated in Ellis–van Creveld syndrome
journal, December 2002
- Galdzicka, M.; Patnala, S.; Hirshman, M. G.
- Molecular Genetics and Metabolism, Vol. 77, Issue 4
Protein threading using context-specific alignment potential
journal, June 2013
- Ma, J.; Wang, S.; Zhao, F.
- Bioinformatics, Vol. 29, Issue 13
Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of query and corresponding native properties of templates
journal, June 2011
- Yang, Y.; Faraggi, E.; Zhao, H.
- Bioinformatics, Vol. 27, Issue 15
Novel and recurrent EVC and EVC2 mutations in Ellis-van Creveld syndrome and Weyers acrofacial dyostosis
journal, February 2013
- D'Asdia, Maria Cecilia; Torrente, Isabella; Consoli, Federica
- European Journal of Medical Genetics, Vol. 56, Issue 2
Recognition of a protein fold in the context of the SCOP classification
journal, June 1999
- Dubchak, Inna; Muchnik, Ilya; Mayor, Christopher
- Proteins: Structure, Function, and Genetics, Vol. 35, Issue 4
Fast and accurate automatic structure prediction with HHpred
journal, January 2009
- Hildebrand, Andrea; Remmert, Michael; Biegert, Andreas
- Proteins: Structure, Function, and Bioinformatics, Vol. 77, Issue S9
Improving taxonomy-based protein fold recognition by using global and local features: Protein Fold Recognition by TAXFOLD
journal, May 2011
- Yang, Jian-Yi; Chen, Xin
- Proteins: Structure, Function, and Bioinformatics, Vol. 79, Issue 7
Identification of related proteins on family, superfamily and fold level 1 1Edited by F. C. Cohen
journal, January 2000
- Lindahl, Erik; Elofsson, Arne
- Journal of Molecular Biology, Vol. 295, Issue 3
Multi-class protein fold recognition using support vector machines and neural networks
journal, April 2001
- Ding, C. H. Q.; Dubchak, I.
- Bioinformatics, Vol. 17, Issue 4
A census of human RNA-binding proteins
journal, November 2014
- Gerstberger, Stefanie; Hafner, Markus; Tuschl, Thomas
- Nature Reviews Genetics, Vol. 15, Issue 12
NoFold: RNA structure clustering without folding or alignment
journal, September 2014
- Middleton, Sarah A.; Kim, Junhyong
- RNA, Vol. 20, Issue 11
Protein superfamilles and domain superfolds
journal, December 1994
- Orengo, Christine A.; Jones, David T.; Thornton, Janet M.
- Nature, Vol. 372, Issue 6507
SCOPe: Structural Classification of Proteins—extended, integrating SCOP and ASTRAL data and classification of new structures
journal, December 2013
- Fox, Naomi K.; Brenner, Steven E.; Chandonia, John-Marc
- Nucleic Acids Research, Vol. 42, Issue D1
The HHpred interactive server for protein homology detection and structure prediction
journal, July 2005
- Soding, J.; Biegert, A.; Lupas, A. N.
- Nucleic Acids Research, Vol. 33, Issue Web Server
A Segmentation-Based Method to Extract Structural and Evolutionary Features for Protein Fold Recognition
journal, May 2014
- Dehzangi, Abdollah; Paliwal, Kuldip; Lyons, James
- IEEE/ACM Transactions on Computational Biology and Bioinformatics, Vol. 11, Issue 3
RBPPred: predicting RNA-binding proteins from sequence using SVM
journal, December 2016
- Zhang, Xiaoli; Liu, Shiyong
- Bioinformatics
Structural Genomics of Minimal Organisms and Protein Fold Space
journal, September 2005
- Kim, Sung-Hou; Shin, Dong Hae; Liu, Jinyu
- Journal of Structural and Functional Genomics, Vol. 6, Issue 2-3
Novel and recurrent EVC and EVC2 mutations in Ellis-van Creveld syndrome and Weyers acrofacial dyostosis
journal, February 2013
- D'Asdia, Maria Cecilia; Torrente, Isabella; Consoli, Federica
- European Journal of Medical Genetics, Vol. 56, Issue 2
Probabilistic expression of spatially varied amino acid dimers into general form of Chou׳s pseudo amino acid composition for protein fold recognition
journal, September 2015
- Saini, Harsh; Raicar, Gaurav; Sharma, Alok
- Journal of Theoretical Biology, Vol. 380
Protein superfamilles and domain superfolds
journal, December 1994
- Orengo, Christine A.; Jones, David T.; Thornton, Janet M.
- Nature, Vol. 372, Issue 6507
The structure of the protein universe and genome evolution
journal, November 2002
- Koonin, Eugene V.; Wolf, Yuri I.; Karev, Georgy P.
- Nature, Vol. 420, Issue 6912
Deep learning
journal, May 2015
- LeCun, Yann; Bengio, Yoshua; Hinton, Geoffrey
- Nature, Vol. 521, Issue 7553
I-TASSER: a unified platform for automated protein structure and function prediction
journal, March 2010
- Roy, Ambrish; Kucukural, Alper; Zhang, Yang
- Nature Protocols, Vol. 5, Issue 4
Template-based protein structure modeling using the RaptorX web server
journal, July 2012
- Källberg, Morten; Wang, Haipeng; Wang, Sheng
- Nature Protocols, Vol. 7, Issue 8
Improving Protein Fold Recognition by Deep Learning Networks
journal, December 2015
- Jo, Taeho; Hou, Jie; Eickholt, Jesse
- Scientific Reports, Vol. 5, Issue 1
Multi-class protein fold recognition using support vector machines and neural networks
journal, April 2001
- Ding, C. H. Q.; Dubchak, I.
- Bioinformatics, Vol. 17, Issue 4
A machine learning information retrieval approach to protein fold recognition
journal, March 2006
- Cheng, J.; Baldi, P.
- Bioinformatics, Vol. 22, Issue 12
Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of query and corresponding native properties of templates
journal, June 2011
- Yang, Y.; Faraggi, E.; Zhao, H.
- Bioinformatics, Vol. 27, Issue 15
A conditional neural fields model for protein threading
journal, June 2012
- Ma, Jianzhu; Peng, Jian; Wang, Sheng
- Bioinformatics, Vol. 28, Issue 12
Protein threading using context-specific alignment potential
journal, June 2013
- Ma, J.; Wang, S.; Zhao, F.
- Bioinformatics, Vol. 29, Issue 13
Protein fold recognition using geometric kernel data fusion
journal, March 2014
- Zakeri, Pooya; Jeuris, Ben; Vandebril, Raf
- Bioinformatics, Vol. 30, Issue 13
The HHpred interactive server for protein homology detection and structure prediction
journal, July 2005
- Soding, J.; Biegert, A.; Lupas, A. N.
- Nucleic Acids Research, Vol. 33, Issue Web Server
The Proteome Folding Project: Proteome-scale prediction of structure and function
journal, August 2011
- Drew, K.; Winters, P.; Butterfoss, G. L.
- Genome Research, Vol. 21, Issue 11
Input space versus feature space in kernel-based methods
journal, January 1999
- Scholkopf, B.; Mika, S.; Burges, C. J. C.
- IEEE Transactions on Neural Networks, Vol. 10, Issue 5
Enhanced Protein Fold Prediction Method Through a Novel Feature Extraction Technique
journal, September 2015
- Wei, Leyi; Liao, Minghong; Gao, Xing
- IEEE Transactions on NanoBioscience, Vol. 14, Issue 6
Advancing the Accuracy of Protein Fold Recognition by Utilizing Profiles From Hidden Markov Models
journal, October 2015
- Lyons, James; Dehzangi, Abdollah; Heffernan, Rhys
- IEEE Transactions on NanoBioscience, Vol. 14, Issue 7
A Novel RNA-Binding Protein, Ossa/C9orf10, Regulates Activity of Src Kinases To Protect Cells from Oxidative Stress-Induced Apoptosis
journal, November 2008
- Tanaka, Masamitsu; Sasaki, Kazuki; Kamata, Reiko
- Molecular and Cellular Biology, Vol. 29, Issue 2
BLAST+: architecture and applications
journal, January 2009
- Camacho, Christiam; Coulouris, George; Avagyan, Vahram
- BMC Bioinformatics, Vol. 10, Issue 1
Works referencing / citing this record:
Comprehensive catalog of dendritically localized mRNA isoforms from sub-cellular sequencing of single mouse neurons
journal, January 2019
- Middleton, Sarah A.; Eberwine, James; Kim, Junhyong
- BMC Biology, Vol. 17, Issue 1
Comprehensive catalog of dendritically localized mRNA isoforms from sub-cellular sequencing of single mouse neurons
posted_content, March 2018
- Middleton, Sarah A.; Eberwine, James; Kim, Junhyong
- BMC Biology