skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Complete fold annotation of the human proteome using a novel structural feature space

Abstract

Recognition of protein structural fold is the starting point for many structure prediction tools and protein function inference. Fold prediction is computationally demanding and recognizing novel folds is difficult such that the majority of proteins have not been annotated for fold classification. Here we describe a new machine learning approach using a novel feature space that can be used for accurate recognition of all 1,221 currently known folds and inference of unknown novel folds. We show that our method achieves better than 94% accuracy even when many folds have only one training example. We demonstrate the utility of this method by predicting the folds of 34,330 human protein domains and showing that these predictions can yield useful insights into potential biological function, such as prediction of RNA-binding ability. Finally, our method can be applied to de novo fold prediction of entire proteomes and identify candidate novel fold families.

Authors:
 [1];  [2];  [3]
  1. Univ. of Pennsylvania, Philadelphia, PA (United States). Genomics and Computational Biology Program
  2. Univ. of Pennsylvania, Philadelphia, PA (United States). Dept. of Computer Science
  3. Univ. of Pennsylvania, Philadelphia, PA (United States). Genomics and Computational Biology Program; Univ. of Pennsylvania, Philadelphia, PA (United States). Dept. of Biology
Publication Date:
Research Org.:
Krell Inst., Ames, IA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1366516
Grant/Contract Number:  
FG02-97ER25308
Resource Type:
Accepted Manuscript
Journal Name:
Scientific Reports
Additional Journal Information:
Journal Volume: 7; Journal ID: ISSN 2045-2322
Publisher:
Nature Publishing Group
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES

Citation Formats

Middleton, Sarah A., Illuminati, Joseph, and Kim, Junhyong. Complete fold annotation of the human proteome using a novel structural feature space. United States: N. p., 2017. Web. doi:10.1038/srep46321.
Middleton, Sarah A., Illuminati, Joseph, & Kim, Junhyong. Complete fold annotation of the human proteome using a novel structural feature space. United States. doi:10.1038/srep46321.
Middleton, Sarah A., Illuminati, Joseph, and Kim, Junhyong. Thu . "Complete fold annotation of the human proteome using a novel structural feature space". United States. doi:10.1038/srep46321. https://www.osti.gov/servlets/purl/1366516.
@article{osti_1366516,
title = {Complete fold annotation of the human proteome using a novel structural feature space},
author = {Middleton, Sarah A. and Illuminati, Joseph and Kim, Junhyong},
abstractNote = {Recognition of protein structural fold is the starting point for many structure prediction tools and protein function inference. Fold prediction is computationally demanding and recognizing novel folds is difficult such that the majority of proteins have not been annotated for fold classification. Here we describe a new machine learning approach using a novel feature space that can be used for accurate recognition of all 1,221 currently known folds and inference of unknown novel folds. We show that our method achieves better than 94% accuracy even when many folds have only one training example. We demonstrate the utility of this method by predicting the folds of 34,330 human protein domains and showing that these predictions can yield useful insights into potential biological function, such as prediction of RNA-binding ability. Finally, our method can be applied to de novo fold prediction of entire proteomes and identify candidate novel fold families.},
doi = {10.1038/srep46321},
journal = {Scientific Reports},
number = ,
volume = 7,
place = {United States},
year = {2017},
month = {4}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Save / Share:

Works referenced in this record:

Deep learning
journal, May 2015

  • LeCun, Yann; Bengio, Yoshua; Hinton, Geoffrey
  • Nature, Vol. 521, Issue 7553
  • DOI: 10.1038/nature14539

Assessment of template-based protein structure predictions in CASP10: CASP10 TBM Assessment
journal, January 2014

  • Huang, Yuanpeng J.; Mao, Binchen; Aramini, James M.
  • Proteins: Structure, Function, and Bioinformatics, Vol. 82
  • DOI: 10.1002/prot.24488

Template-based protein structure modeling using the RaptorX web server
journal, July 2012


I-TASSER: a unified platform for automated protein structure and function prediction
journal, March 2010

  • Roy, Ambrish; Kucukural, Alper; Zhang, Yang
  • Nature Protocols, Vol. 5, Issue 4
  • DOI: 10.1038/nprot.2010.5

Input space versus feature space in kernel-based methods
journal, January 1999

  • Scholkopf, B.; Mika, S.; Burges, C. J. C.
  • IEEE Transactions on Neural Networks, Vol. 10, Issue 5
  • DOI: 10.1109/72.788641

The Proteome Folding Project: Proteome-scale prediction of structure and function
journal, August 2011


BLAST+: architecture and applications
journal, January 2009

  • Camacho, Christiam; Coulouris, George; Avagyan, Vahram
  • BMC Bioinformatics, Vol. 10, Issue 1
  • DOI: 10.1186/1471-2105-10-421

The structure of the protein universe and genome evolution
journal, November 2002

  • Koonin, Eugene V.; Wolf, Yuri I.; Karev, Georgy P.
  • Nature, Vol. 420, Issue 6912
  • DOI: 10.1038/nature01256

Fast and accurate automatic structure prediction with HHpred
journal, January 2009

  • Hildebrand, Andrea; Remmert, Michael; Biegert, Andreas
  • Proteins: Structure, Function, and Bioinformatics, Vol. 77, Issue S9
  • DOI: 10.1002/prot.22499

A census of human RNA-binding proteins
journal, November 2014

  • Gerstberger, Stefanie; Hafner, Markus; Tuschl, Thomas
  • Nature Reviews Genetics, Vol. 15, Issue 12
  • DOI: 10.1038/nrg3813

SCOPe: Structural Classification of Proteins—extended, integrating SCOP and ASTRAL data and classification of new structures
journal, December 2013

  • Fox, Naomi K.; Brenner, Steven E.; Chandonia, John-Marc
  • Nucleic Acids Research, Vol. 42, Issue D1
  • DOI: 10.1093/nar/gkt1240

The HHpred interactive server for protein homology detection and structure prediction
journal, July 2005

  • Soding, J.; Biegert, A.; Lupas, A. N.
  • Nucleic Acids Research, Vol. 33, Issue Web Server
  • DOI: 10.1093/nar/gki408