skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Analysis of Strand-Specific RNA-Seq Data Using Machine Learning Reveals the Structures of Transcription Units in Clostridium thermocellum

Journal Article · · Nucleic Acids Research
DOI:https://doi.org/10.1093/nar/gkv177· OSTI ID:1242033
 [1];  [1];  [2];  [3];  [4];  [5];  [6]
  1. Univ. of Georgia, Athens, GA (United States); BioEnergy Science Center, Oak Ridge, TN (United States)
  2. BioEnergy Science Center, Oak Ridge, TN (United States); Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); National Renewable Energy Lab. (NREL), Golden, CO (United States)
  3. Univ. of Georgia, Athens, GA (United States)
  4. BioEnergy Science Center, Oak Ridge, TN (United States); Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
  5. BioEnergy Science Center, Oak Ridge, TN (United States); Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States);
  6. Univ. of Georgia, Athens, GA (United States); BioEnergy Science Center, Oak Ridge, TN (United States); Jilin Univ., Changchun (China)

Identification of transcription units (TUs) encoded in a bacterial genome is essential to elucidation of transcriptional regulation of the organism. To gain a detailed understanding of the dynamically composed TU structures, we have used four strand-specific RNA-seq (ssRNA-seq) datasets collected under two experimental conditions to derive the genomic TU organization of Clostridium thermocellum using a machine-learning approach. Our method accurately predicted the genomic boundaries of individual TUs based on two sets of parameters measuring the RNA-seq expression patterns across the genome: expression-level continuity and variance. A total of 2590 distinct TUs are predicted based on the four RNA-seq datasets. Among the predicted TUs, 44% have multiple genes. We assessed our prediction method on an independent set of RNA-seq data with longer reads. The evaluation confirmed the high quality of the predicted TUs. Functional enrichment analyses on a selected subset of the predicted TUs revealed interesting biology. To demonstrate the generality of the prediction method, we have also applied the method to RNA-seq data collected on Escherichia coli and achieved high prediction accuracies. The TU prediction program named SeqTU is publicly available at https://code.google.com/p/seqtu/. We expect that the predicted TUs can serve as the baseline information for studying transcriptional and post-transcriptional regulation in C. thermocellum and other bacteria.

Research Organization:
National Renewable Energy Laboratory (NREL), Golden, CO (United States); Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). BioEnergy Science Center (BESC)
Sponsoring Organization:
USDOE Office of Science (SC), Basic Energy Sciences (BES)
Grant/Contract Number:
AC36-08GO28308; AC05-00OR22725
OSTI ID:
1242033
Alternate ID(s):
OSTI ID: 1265796
Report Number(s):
NREL/JA-5100-64668
Journal Information:
Nucleic Acids Research, Vol. 43, Issue 10; Related Information: Nucleic Acids Research; ISSN 0305-1048
Publisher:
Oxford University PressCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 17 works
Citation information provided by
Web of Science

References (34)

The transcription unit architecture of the Escherichia coli genome journal November 2009
Genome-wide operon prediction in Staphylococcus aureus journal July 2004
ODB: a database of operons accumulating known operons across multiple genomes journal January 2006
DBTBS: a database of transcriptional regulation in Bacillus subtilis containing upstream intergenic conservation information journal October 2007
OperonDB: a comprehensive database of predicted operons in microbial genomes journal January 2009
DOOR: a database for prokaryotic operons journal November 2008
DOOR 2.0: presenting operons and their functions through dynamic and integrated views journal November 2013
Mycoplasma hyopneumoniae Transcription Unit Organization: Genome Survey and Prediction journal November 2011
The relative value of operon predictions journal April 2008
Mapping the Burkholderia cenocepacia niche response via high-throughput sequencing journal February 2009
Deep RNA sequencing of L. monocytogenes reveals overlapping and extensive stationary phase and sigma B-dependent transcriptomes, including multiple highly transcribed noncoding RNAs journal January 2009
Deep sequencing-based discovery of the Chlamydia trachomatis transcriptome journal November 2009
Review Application of RNA-seq to reveal the transcript profile in bacteria journal January 2011
Computational analysis of bacterial RNA-Seq data journal May 2013
Transcriptome dynamics-based operon prediction in prokaryotes journal May 2014
Clostridium thermocellum ATCC27405 transcriptomic, metabolomic and proteomic profiles after ethanol stress journal January 2012
Solexa Ltd journal June 2004
Genome sequencing in microfabricated high-density picolitre reactors journal July 2005
A new framework for identifying cis-regulatory motifs in prokaryotes journal December 2010
Minimal metabolic pathway structure is consistent with associated biomolecular interactions journal July 2014
RNA degradome--its biogenesis and functions journal June 2011
Transcriptome Complexity in a Genome-Reduced Bacterium journal November 2009
The Listeria transcriptional landscape from saprophytism to virulence journal May 2009
LIBSVM: A library for support vector machines journal April 2011
Interruptions in gene expression drive highly expressed operons to the leading strand of DNA replication journal June 2005
The percentage of bacterial genes on leading versus lagging strands is influenced by multiple balancing forces journal June 2012
An integrated toolkit for accurate prediction and analysis of cis-regulatory motifs at a genome scale journal July 2013
RegulonDB v8.0: omics data sets, evolutionary conservation, regulatory phrases, cross-validated gold standards and more journal November 2012
RegTransBase – a database of regulatory sequences and interactions based on literature: a resource for investigating transcriptional regulation in prokaryotes journal January 2013
DMINDA: an integrated web server for DNA motif identification and analyses journal April 2014
Rapid, accurate, computational discovery of Rho-independent transcription terminators illuminates their relationship to DNA uptake journal January 2007
Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists journal November 2008
RNA-Seq Based Transcriptional Map of Bovine Respiratory Disease Pathogen “Histophilus somni 2336” journal January 2012
The primary transcriptome of the major human pathogen Helicobacter pylori journal February 2010

Cited By (11)

Bacterial regulon modeling and prediction based on systematic cis regulatory motif analyses journal March 2016
DOOR: a prokaryotic operon database for genome analyses and functional inference journal July 2017
Revisiting operons: an analysis of the landscape of transcriptional units in E. coli journal November 2015
A machine learning classifier trained on cancer transcriptomes detects NF1 inactivation signal in glioblastoma journal February 2017
A New Machine Learning-Based Framework for Mapping Uncertainty Analysis in RNA-Seq Read Alignment and Gene Expression Estimation journal August 2018
Single-Cell RNA Sequencing of Plant-Associated Bacterial Communities journal October 2019
RECTA: Regulon Identification Based on Comparative Genomics and Transcriptomics Analysis journal May 2018
A machine learning classifier trained on cancer transcriptomes detects NF1 inactivation signal in glioblastoma posted_content December 2016
SeqTU: A Web Server for Identification of Bacterial Transcription Units journal March 2017
RECTA: Regulon Identification Based on Comparative Genomics and Transcriptomics Analysis journal March 2018
rSeqTU—A Machine-Learning Based R Package for Prediction of Bacterial Transcription Units journal May 2019