skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Discovery and annotation of small proteins using genomics, proteomics, and computational approaches

Abstract

Small proteins (10 200 amino acids (AA) in length) encoded by short open reading frames (sORF) play important regulatory roles in various biological processes, including tumor progression, stress response, flowering and hormone signaling. However, ab initio discovery of small proteins has been relatively overlooked. Recent advances in deep transcriptome sequencing make it possible to efficiently identify sORFs at the genome level. In this study, we obtained ~2.6 million expressed sequence tag (EST) reads from Populus deltoides leaf transcriptome and reconstructed full-length transcripts from the EST sequences. We identified an initial set of 12,852 sORFs encoding proteins of 10 200 AA in length. Three computational approaches were then used to enrich for bona fide protein-coding sORFs from the initial sORF set: 1) coding-potential prediction, 2) evolutionary conservation between P. deltoides and other plant species, and 3) gene family clustering within P. deltoides. As a result, a high-confidence sORF candidate set containing 1,469 genes was obtained. Analysis of the protein domains, non-protein-coding RNA motifs, sequence length distribution, and protein mass spectrometry data supported this high-confidence sORF set. In the high-confidence sORF candidate set, known protein domains were identified in 1,282 genes (higher-confidence sORF candidate set), out of which 611 genes, designated asmore » highest-confidence candidate sORF set, were also supported by proteomics data. This study not only demonstrates that there are potential sORF candidates to be annotated in sequenced genomes, but also presents an efficient strategy for discovery of sORFs in species with no genome annotation yet available.« less

Authors:
 [1];  [1];  [1];  [1];  [1];  [1];  [1];  [1];  [1];  [2];  [1]
  1. ORNL
  2. U.S. Department of Energy, Joint Genome Institute
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
1023288
DOE Contract Number:  
DE-AC05-00OR22725
Resource Type:
Journal Article
Journal Name:
Genome Research
Additional Journal Information:
Journal Volume: 21; Journal Issue: 4; Journal ID: ISSN 1088--9051
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; 60 APPLIED LIFE SCIENCES; 99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE; AMINO ACIDS; DISTRIBUTION; FORECASTING; GENES; HORMONES; MASS SPECTROSCOPY; NEOPLASMS; PROTEINS; RNA

Citation Formats

Yang, Xiaohan, Tschaplinski, Timothy J, Hurst, Gregory, Jawdy, Sara, Abraham, Paul E, Lankford, Patricia K, Adams, Rachel M, Shah, Manesh B, Hettich, Robert, Kalluri, Udaya C, Gunter, Lee E, Pennacchio, Christa, and Tuskan, Gerald A. Discovery and annotation of small proteins using genomics, proteomics, and computational approaches. United States: N. p., 2011. Web. doi:10.1101/gr.109280.110.
Yang, Xiaohan, Tschaplinski, Timothy J, Hurst, Gregory, Jawdy, Sara, Abraham, Paul E, Lankford, Patricia K, Adams, Rachel M, Shah, Manesh B, Hettich, Robert, Kalluri, Udaya C, Gunter, Lee E, Pennacchio, Christa, & Tuskan, Gerald A. Discovery and annotation of small proteins using genomics, proteomics, and computational approaches. United States. https://doi.org/10.1101/gr.109280.110
Yang, Xiaohan, Tschaplinski, Timothy J, Hurst, Gregory, Jawdy, Sara, Abraham, Paul E, Lankford, Patricia K, Adams, Rachel M, Shah, Manesh B, Hettich, Robert, Kalluri, Udaya C, Gunter, Lee E, Pennacchio, Christa, and Tuskan, Gerald A. 2011. "Discovery and annotation of small proteins using genomics, proteomics, and computational approaches". United States. https://doi.org/10.1101/gr.109280.110.
@article{osti_1023288,
title = {Discovery and annotation of small proteins using genomics, proteomics, and computational approaches},
author = {Yang, Xiaohan and Tschaplinski, Timothy J and Hurst, Gregory and Jawdy, Sara and Abraham, Paul E and Lankford, Patricia K and Adams, Rachel M and Shah, Manesh B and Hettich, Robert and Kalluri, Udaya C and Gunter, Lee E and Pennacchio, Christa and Tuskan, Gerald A},
abstractNote = {Small proteins (10 200 amino acids (AA) in length) encoded by short open reading frames (sORF) play important regulatory roles in various biological processes, including tumor progression, stress response, flowering and hormone signaling. However, ab initio discovery of small proteins has been relatively overlooked. Recent advances in deep transcriptome sequencing make it possible to efficiently identify sORFs at the genome level. In this study, we obtained ~2.6 million expressed sequence tag (EST) reads from Populus deltoides leaf transcriptome and reconstructed full-length transcripts from the EST sequences. We identified an initial set of 12,852 sORFs encoding proteins of 10 200 AA in length. Three computational approaches were then used to enrich for bona fide protein-coding sORFs from the initial sORF set: 1) coding-potential prediction, 2) evolutionary conservation between P. deltoides and other plant species, and 3) gene family clustering within P. deltoides. As a result, a high-confidence sORF candidate set containing 1,469 genes was obtained. Analysis of the protein domains, non-protein-coding RNA motifs, sequence length distribution, and protein mass spectrometry data supported this high-confidence sORF set. In the high-confidence sORF candidate set, known protein domains were identified in 1,282 genes (higher-confidence sORF candidate set), out of which 611 genes, designated as highest-confidence candidate sORF set, were also supported by proteomics data. This study not only demonstrates that there are potential sORF candidates to be annotated in sequenced genomes, but also presents an efficient strategy for discovery of sORFs in species with no genome annotation yet available.},
doi = {10.1101/gr.109280.110},
url = {https://www.osti.gov/biblio/1023288}, journal = {Genome Research},
issn = {1088--9051},
number = 4,
volume = 21,
place = {United States},
year = {Sat Jan 01 00:00:00 EST 2011},
month = {Sat Jan 01 00:00:00 EST 2011}
}