skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Classification of Complete Proteomes of Different Organisms and Protein Sets Based on Their Protein Distributions in Terms of Some Key Attributes of Proteins

Abstract

Here, the existence of complete genome sequences makes it important to develop different approaches for classification of large-scale data sets and to make extraction of biological insights easier. Here, we propose an approach for classification of complete proteomes/protein sets based on protein distributions on some basic attributes. We demonstrate the usefulness of this approach by determining protein distributions in terms of two attributes: protein lengths and protein intrinsic disorder contents (ID). The protein distributions based on L and ID are surveyed for representative proteome organisms and protein sets from the three domains of life. The two-dimensional maps (designated as fingerprints here) from the protein distribution densities in the LD space defined by ln( L) and ID are then constructed. The fingerprints for different organisms and protein sets are found to be distinct with each other, and they can therefore be used for comparative studies. As a test case, phylogenetic trees have been constructed based on the protein distribution densities in the fingerprints of proteomes of organisms without performing any protein sequence comparison and alignments. The phylogenetic trees generated are biologically meaningful, demonstrating that the protein distributions in the LD space may serve as unique phylogenetic signals of the organisms atmore » the proteome level.« less

Authors:
ORCiD logo [1];  [1]; ORCiD logo [2]; ORCiD logo [2]; ORCiD logo [1]
  1. Univ. of Tennessee, Knoxville, TN (United States)
  2. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Biological and Environmental Research (BER) (SC-23)
OSTI Identifier:
1423699
Alternate Identifier(s):
OSTI ID: 1468038
Grant/Contract Number:  
AC05-00OR22725; SC0008834
Resource Type:
Journal Article: Published Article
Journal Name:
International Journal of Genomics
Additional Journal Information:
Journal Volume: 2018; Journal Issue: n/a; Journal ID: ISSN 2314-436X
Publisher:
Hindawi
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES

Citation Formats

Guo, Hao -Bo, Ma, Yue, Tuskan, Gerald A., Yang, Xiaohan, and Guo, Hong. Classification of Complete Proteomes of Different Organisms and Protein Sets Based on Their Protein Distributions in Terms of Some Key Attributes of Proteins. United States: N. p., 2018. Web. doi:10.1155/2018/9784161.
Guo, Hao -Bo, Ma, Yue, Tuskan, Gerald A., Yang, Xiaohan, & Guo, Hong. Classification of Complete Proteomes of Different Organisms and Protein Sets Based on Their Protein Distributions in Terms of Some Key Attributes of Proteins. United States. doi:10.1155/2018/9784161.
Guo, Hao -Bo, Ma, Yue, Tuskan, Gerald A., Yang, Xiaohan, and Guo, Hong. Sun . "Classification of Complete Proteomes of Different Organisms and Protein Sets Based on Their Protein Distributions in Terms of Some Key Attributes of Proteins". United States. doi:10.1155/2018/9784161.
@article{osti_1423699,
title = {Classification of Complete Proteomes of Different Organisms and Protein Sets Based on Their Protein Distributions in Terms of Some Key Attributes of Proteins},
author = {Guo, Hao -Bo and Ma, Yue and Tuskan, Gerald A. and Yang, Xiaohan and Guo, Hong},
abstractNote = {Here, the existence of complete genome sequences makes it important to develop different approaches for classification of large-scale data sets and to make extraction of biological insights easier. Here, we propose an approach for classification of complete proteomes/protein sets based on protein distributions on some basic attributes. We demonstrate the usefulness of this approach by determining protein distributions in terms of two attributes: protein lengths and protein intrinsic disorder contents (ID). The protein distributions based on L and ID are surveyed for representative proteome organisms and protein sets from the three domains of life. The two-dimensional maps (designated as fingerprints here) from the protein distribution densities in the LD space defined by ln(L) and ID are then constructed. The fingerprints for different organisms and protein sets are found to be distinct with each other, and they can therefore be used for comparative studies. As a test case, phylogenetic trees have been constructed based on the protein distribution densities in the fingerprints of proteomes of organisms without performing any protein sequence comparison and alignments. The phylogenetic trees generated are biologically meaningful, demonstrating that the protein distributions in the LD space may serve as unique phylogenetic signals of the organisms at the proteome level.},
doi = {10.1155/2018/9784161},
journal = {International Journal of Genomics},
number = n/a,
volume = 2018,
place = {United States},
year = {Sun Mar 04 00:00:00 EST 2018},
month = {Sun Mar 04 00:00:00 EST 2018}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record at 10.1155/2018/9784161

Save / Share: