skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Analysis of Strand-Specific RNA-Seq Data Using Machine Learning Reveals the Structures of Transcription Units in Clostridium thermocellum

Abstract

The identification of transcription units (TUs) encoded in a bacterial genome is essential to elucidation of transcriptional regulation of the organism. To gain a detailed understanding of the dynamically composed TU structures, we have used four strand-specific RNA-seq (ssRNA-seq) datasets collected under two experimental conditions to derive the genomic TU organization of Clostridium thermocellum using a machine-learning approach. Our method accurately predicted the genomic boundaries of individual TUs based on two sets of parameters measuring the RNA-seq expression patterns across the genome: expression-level continuity and variance. A total of 2590 distinct TUs are predicted based on the four RNA-seq datasets. Moreover, among the predicted TUs, 44% have multiple genes. We assessed our prediction method on an independent set of RNA-seq data with longer reads. The evaluation confirmed the high quality of the predicted TUs. Functional enrichment analyses on a selected subset of the predicted TUs revealed interesting biology. To demonstrate the generality of the prediction method, we have also applied the method to RNA-seq data collected on Escherichia coli and achieved high prediction accuracies. The TU prediction program named SeqTU is publicly available athttps://code.google.com/p/seqtu/. We expect that the predicted TUs can serve as the baseline information for studying transcriptional andmore » post-transcriptional regulation in C. thermocellum and other bacteria.« less

Authors:
 [1];  [1];  [2];  [3];  [4];  [5];  [6]
  1. Univ. of Georgia, Athens, GA (United States); BioEnergy Science Center, Oak Ridge, TN (United States)
  2. BioEnergy Science Center, Oak Ridge, TN (United States); Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); National Renewable Energy Lab. (NREL), Golden, CO (United States)
  3. Univ. of Georgia, Athens, GA (United States)
  4. BioEnergy Science Center, Oak Ridge, TN (United States); Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
  5. BioEnergy Science Center, Oak Ridge, TN (United States); Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States);
  6. Univ. of Georgia, Athens, GA (United States); BioEnergy Science Center, Oak Ridge, TN (United States); Jilin Univ., Changchun (China)
Publication Date:
Research Org.:
National Renewable Energy Lab. (NREL), Golden, CO (United States); Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). BioEnergy Science Center (BESC)
Sponsoring Org.:
USDOE Office of Science (SC), Basic Energy Sciences (BES) (SC-22)
OSTI Identifier:
1242033
Alternate Identifier(s):
OSTI ID: 1265796
Report Number(s):
NREL/JA-5100-64668
Journal ID: ISSN 0305-1048
Grant/Contract Number:
AC36-08GO28308; AC05-00OR22725
Resource Type:
Journal Article: Accepted Manuscript
Journal Name:
Nucleic Acids Research
Additional Journal Information:
Journal Volume: 43; Journal Issue: 10; Related Information: Nucleic Acids Research; Journal ID: ISSN 0305-1048
Publisher:
Oxford University Press
Country of Publication:
United States
Language:
English
Subject:
09 BIOMASS FUELS; 59 BASIC BIOLOGICAL SCIENCES; transcription units (TU); bacterial genome

Citation Formats

Chou, Wen-Chi, Ma, Qin, Yang, Shihui, Cao, Sha, Klingeman, Dawn M., Brown, Steven D., and Xu, Ying. Analysis of Strand-Specific RNA-Seq Data Using Machine Learning Reveals the Structures of Transcription Units in Clostridium thermocellum. United States: N. p., 2015. Web. doi:10.1093/nar/gkv177.
Chou, Wen-Chi, Ma, Qin, Yang, Shihui, Cao, Sha, Klingeman, Dawn M., Brown, Steven D., & Xu, Ying. Analysis of Strand-Specific RNA-Seq Data Using Machine Learning Reveals the Structures of Transcription Units in Clostridium thermocellum. United States. doi:10.1093/nar/gkv177.
Chou, Wen-Chi, Ma, Qin, Yang, Shihui, Cao, Sha, Klingeman, Dawn M., Brown, Steven D., and Xu, Ying. Thu . "Analysis of Strand-Specific RNA-Seq Data Using Machine Learning Reveals the Structures of Transcription Units in Clostridium thermocellum". United States. doi:10.1093/nar/gkv177. https://www.osti.gov/servlets/purl/1242033.
@article{osti_1242033,
title = {Analysis of Strand-Specific RNA-Seq Data Using Machine Learning Reveals the Structures of Transcription Units in Clostridium thermocellum},
author = {Chou, Wen-Chi and Ma, Qin and Yang, Shihui and Cao, Sha and Klingeman, Dawn M. and Brown, Steven D. and Xu, Ying},
abstractNote = {The identification of transcription units (TUs) encoded in a bacterial genome is essential to elucidation of transcriptional regulation of the organism. To gain a detailed understanding of the dynamically composed TU structures, we have used four strand-specific RNA-seq (ssRNA-seq) datasets collected under two experimental conditions to derive the genomic TU organization of Clostridium thermocellum using a machine-learning approach. Our method accurately predicted the genomic boundaries of individual TUs based on two sets of parameters measuring the RNA-seq expression patterns across the genome: expression-level continuity and variance. A total of 2590 distinct TUs are predicted based on the four RNA-seq datasets. Moreover, among the predicted TUs, 44% have multiple genes. We assessed our prediction method on an independent set of RNA-seq data with longer reads. The evaluation confirmed the high quality of the predicted TUs. Functional enrichment analyses on a selected subset of the predicted TUs revealed interesting biology. To demonstrate the generality of the prediction method, we have also applied the method to RNA-seq data collected on Escherichia coli and achieved high prediction accuracies. The TU prediction program named SeqTU is publicly available athttps://code.google.com/p/seqtu/. We expect that the predicted TUs can serve as the baseline information for studying transcriptional and post-transcriptional regulation in C. thermocellum and other bacteria.},
doi = {10.1093/nar/gkv177},
journal = {Nucleic Acids Research},
number = 10,
volume = 43,
place = {United States},
year = {Thu Mar 12 00:00:00 EDT 2015},
month = {Thu Mar 12 00:00:00 EDT 2015}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 5 works
Citation information provided by
Web of Science

Save / Share: