Analysis of Strand-Specific RNA-Seq Data Using Machine Learning Reveals the Structures of Transcription Units in Clostridium thermocellum
- Univ. of Georgia, Athens, GA (United States); BioEnergy Science Center, Oak Ridge, TN (United States)
- BioEnergy Science Center, Oak Ridge, TN (United States); Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); National Renewable Energy Lab. (NREL), Golden, CO (United States)
- Univ. of Georgia, Athens, GA (United States)
- BioEnergy Science Center, Oak Ridge, TN (United States); Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
- BioEnergy Science Center, Oak Ridge, TN (United States); Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States);
- Univ. of Georgia, Athens, GA (United States); BioEnergy Science Center, Oak Ridge, TN (United States); Jilin Univ., Changchun (China)
Identification of transcription units (TUs) encoded in a bacterial genome is essential to elucidation of transcriptional regulation of the organism. To gain a detailed understanding of the dynamically composed TU structures, we have used four strand-specific RNA-seq (ssRNA-seq) datasets collected under two experimental conditions to derive the genomic TU organization of Clostridium thermocellum using a machine-learning approach. Our method accurately predicted the genomic boundaries of individual TUs based on two sets of parameters measuring the RNA-seq expression patterns across the genome: expression-level continuity and variance. A total of 2590 distinct TUs are predicted based on the four RNA-seq datasets. Among the predicted TUs, 44% have multiple genes. We assessed our prediction method on an independent set of RNA-seq data with longer reads. The evaluation confirmed the high quality of the predicted TUs. Functional enrichment analyses on a selected subset of the predicted TUs revealed interesting biology. To demonstrate the generality of the prediction method, we have also applied the method to RNA-seq data collected on Escherichia coli and achieved high prediction accuracies. The TU prediction program named SeqTU is publicly available at https://code.google.com/p/seqtu/. We expect that the predicted TUs can serve as the baseline information for studying transcriptional and post-transcriptional regulation in C. thermocellum and other bacteria.
- Research Organization:
- National Renewable Energy Laboratory (NREL), Golden, CO (United States); Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). BioEnergy Science Center (BESC)
- Sponsoring Organization:
- USDOE Office of Science (SC), Basic Energy Sciences (BES)
- Grant/Contract Number:
- AC36-08GO28308; AC05-00OR22725
- OSTI ID:
- 1242033
- Alternate ID(s):
- OSTI ID: 1265796
- Report Number(s):
- NREL/JA-5100-64668
- Journal Information:
- Nucleic Acids Research, Vol. 43, Issue 10; Related Information: Nucleic Acids Research; ISSN 0305-1048
- Publisher:
- Oxford University PressCopyright Statement
- Country of Publication:
- United States
- Language:
- English
Web of Science
Similar Records
Global transcriptome analysis of Clostridium thermocellum ATCC 27405 during growth on dilute acid pretreated Populus and switchgrass
Genome-wide Transcription Factor DNA Binding Sites and Gene Regulatory Networks in Clostridium thermocellum