Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Analysis of Strand-Specific RNA-Seq Data Using Machine Learning Reveals the Structures of Transcription Units in Clostridium thermocellum

Journal Article · · Nucleic Acids Research
DOI:https://doi.org/10.1093/nar/gkv177· OSTI ID:1242033
 [1];  [1];  [2];  [3];  [4];  [4];  [5]
  1. Univ. of Georgia, Athens, GA (United States); BioEnergy Science Center, Oak Ridge, TN (United States)
  2. BioEnergy Science Center, Oak Ridge, TN (United States); Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); National Renewable Energy Lab. (NREL), Golden, CO (United States)
  3. Univ. of Georgia, Athens, GA (United States)
  4. BioEnergy Science Center, Oak Ridge, TN (United States); Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
  5. Univ. of Georgia, Athens, GA (United States); BioEnergy Science Center, Oak Ridge, TN (United States); Jilin Univ., Changchun (China)
The identification of transcription units (TUs) encoded in a bacterial genome is essential to elucidation of transcriptional regulation of the organism. To gain a detailed understanding of the dynamically composed TU structures, we have used four strand-specific RNA-seq (ssRNA-seq) datasets collected under two experimental conditions to derive the genomic TU organization of Clostridium thermocellum using a machine-learning approach. Our method accurately predicted the genomic boundaries of individual TUs based on two sets of parameters measuring the RNA-seq expression patterns across the genome: expression-level continuity and variance. A total of 2590 distinct TUs are predicted based on the four RNA-seq datasets. Moreover, among the predicted TUs, 44% have multiple genes. We assessed our prediction method on an independent set of RNA-seq data with longer reads. The evaluation confirmed the high quality of the predicted TUs. Functional enrichment analyses on a selected subset of the predicted TUs revealed interesting biology. To demonstrate the generality of the prediction method, we have also applied the method to RNA-seq data collected on Escherichia coli and achieved high prediction accuracies. The TU prediction program named SeqTU is publicly available athttps://code.google.com/p/seqtu/. We expect that the predicted TUs can serve as the baseline information for studying transcriptional and post-transcriptional regulation in C. thermocellum and other bacteria.
Research Organization:
National Renewable Energy Lab. (NREL), Golden, CO (United States); Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). BioEnergy Science Center (BESC)
Sponsoring Organization:
USDOE Office of Science (SC); USDOE Office of Science (SC), Basic Energy Sciences (BES) (SC-22)
Grant/Contract Number:
AC05-00OR22725; AC36-08GO28308
OSTI ID:
1242033
Alternate ID(s):
OSTI ID: 1265796
Report Number(s):
NREL/JA--5100-64668
Journal Information:
Nucleic Acids Research, Journal Name: Nucleic Acids Research Journal Issue: 10 Vol. 43; ISSN 0305-1048
Publisher:
Oxford University PressCopyright Statement
Country of Publication:
United States
Language:
English

References (34)

Genome sequencing in microfabricated high-density picolitre reactors journal July 2005
The Listeria transcriptional landscape from saprophytism to virulence journal May 2009
The primary transcriptome of the major human pathogen Helicobacter pylori journal February 2010
The transcription unit architecture of the Escherichia coli genome journal November 2009
Mapping the Burkholderia cenocepacia niche response via high-throughput sequencing journal February 2009
Mycoplasma hyopneumoniae Transcription Unit Organization: Genome Survey and Prediction journal November 2011
Genome-wide operon prediction in Staphylococcus aureus journal July 2004
Interruptions in gene expression drive highly expressed operons to the leading strand of DNA replication journal June 2005
ODB: a database of operons accumulating known operons across multiple genomes journal January 2006
DBTBS: a database of transcriptional regulation in Bacillus subtilis containing upstream intergenic conservation information journal October 2007
DOOR: a database for prokaryotic operons journal November 2008
OperonDB: a comprehensive database of predicted operons in microbial genomes journal January 2009
Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists journal November 2008
Deep sequencing-based discovery of the Chlamydia trachomatis transcriptome journal November 2009
A new framework for identifying cis-regulatory motifs in prokaryotes journal December 2010
RNA degradome--its biogenesis and functions journal June 2011
RegulonDB v8.0: omics data sets, evolutionary conservation, regulatory phrases, cross-validated gold standards and more journal November 2012
DOOR 2.0: presenting operons and their functions through dynamic and integrated views journal November 2013
Computational analysis of bacterial RNA-Seq data journal May 2013
DMINDA: an integrated web server for DNA motif identification and analyses journal April 2014
Transcriptome Complexity in a Genome-Reduced Bacterium journal November 2009
Transcriptome dynamics-based operon prediction in prokaryotes journal May 2014
Deep RNA sequencing of L. monocytogenes reveals overlapping and extensive stationary phase and sigma B-dependent transcriptomes, including multiple highly transcribed noncoding RNAs journal January 2009
Clostridium thermocellum ATCC27405 transcriptomic, metabolomic and proteomic profiles after ethanol stress journal January 2012
RegTransBase – a database of regulatory sequences and interactions based on literature: a resource for investigating transcriptional regulation in prokaryotes journal January 2013
Rapid, accurate, computational discovery of Rho-independent transcription terminators illuminates their relationship to DNA uptake journal January 2007
Solexa Ltd journal June 2004
Review Application of RNA-seq to reveal the transcript profile in bacteria journal January 2011
The relative value of operon predictions journal April 2008
An integrated toolkit for accurate prediction and analysis of cis-regulatory motifs at a genome scale journal July 2013
The percentage of bacterial genes on leading versus lagging strands is influenced by multiple balancing forces journal June 2012
LIBSVM: A library for support vector machines journal April 2011
RNA-Seq Based Transcriptional Map of Bovine Respiratory Disease Pathogen “Histophilus somni 2336” journal January 2012
Minimal metabolic pathway structure is consistent with associated biomolecular interactions journal July 2014

Cited By (11)

SeqTU: A Web Server for Identification of Bacterial Transcription Units journal March 2017
A machine learning classifier trained on cancer transcriptomes detects NF1 inactivation signal in glioblastoma posted_content December 2016
RECTA: Regulon Identification Based on Comparative Genomics and Transcriptomics Analysis journal March 2018
Revisiting operons: an analysis of the landscape of transcriptional units in E. coli journal November 2015
A New Machine Learning-Based Framework for Mapping Uncertainty Analysis in RNA-Seq Read Alignment and Gene Expression Estimation journal August 2018
rSeqTU—A Machine-Learning Based R Package for Prediction of Bacterial Transcription Units journal May 2019
Single-Cell RNA Sequencing of Plant-Associated Bacterial Communities journal October 2019
Bacterial regulon modeling and prediction based on systematic cis regulatory motif analyses journal March 2016
DOOR: a prokaryotic operon database for genome analyses and functional inference journal July 2017
A machine learning classifier trained on cancer transcriptomes detects NF1 inactivation signal in glioblastoma journal February 2017
RECTA: Regulon Identification Based on Comparative Genomics and Transcriptomics Analysis journal May 2018

Similar Records

SeqTU: A web server for identification of bacterial transcription units
Journal Article · Mon Mar 06 19:00:00 EST 2017 · Scientific Reports · OSTI ID:1355909

Global transcriptome analysis of Clostridium thermocellum ATCC 27405 during growth on dilute acid pretreated Populus and switchgrass
Journal Article · Mon Dec 31 23:00:00 EST 2012 · Biotechnology for Biofuels · OSTI ID:1108542