Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

OperonSEQer: A set of machine-learning algorithms with threshold voting for detection of operon pairs using short-read RNA-sequencing data

Journal Article · · PLoS Computational Biology (Online)

Operon prediction in prokaryotes is critical not only for understanding the regulation of endogenous gene expression, but also for exogenous targeting of genes using newly developed tools such as CRISPR-based gene modulation. A number of methods have used transcriptomics data to predict operons, based on the premise that contiguous genes in an operon will be expressed at similar levels. While promising results have been observed using these methods, most of them do not address uncertainty caused by technical variability between experiments, which is especially relevant when the amount of data available is small. In addition, many existing methods do not provide the flexibility to determine the stringency with which genes should be evaluated for being in an operon pair. We present OperonSEQer, a set of machine learning algorithms that uses the statistic and p-value from a non-parametric analysis of variance test (Kruskal-Wallis) to determine the likelihood that two adjacent genes are expressed from the same RNA molecule. We implement a voting system to allow users to choose the stringency of operon calls depending on whether your priority is high recall or high specificity. In addition, we provide the code so that users can retrain the algorithm and re-establish hyperparameters based on any data they choose, allowing for this method to be expanded as additional data is generated. We show that our approach detects operon pairs that are missed by current methods by comparing our predictions to publicly available long-read sequencing data. OperonSEQer therefore improves on existing methods in terms of accuracy, flexibility, and adaptability.

Sponsoring Organization:
USDOE
OSTI ID:
1840808
Alternate ID(s):
OSTI ID: 1838398
OSTI ID: 1841991
Journal Information:
PLoS Computational Biology (Online), Journal Name: PLoS Computational Biology (Online) Journal Issue: 1 Vol. 18; ISSN 1553-7358
Publisher:
Public Library of Science (PLoS)Copyright Statement
Country of Publication:
United States
Language:
English

References (60)

Ethylene causes transcriptomic changes in Synechocystis during phototaxis journal March 2018
Operons journal August 2009
Performance-weighted-voting model: an ensemble machine learning method for cancer type classification using whole-exome sequencing mutation journal December 2020
Predicting mRNA Abundance Directly from Genomic Sequence Using Deep Convolutional Neural Networks journal May 2020
Inhibition of Bacterial Gene Transcription with an RpoN-Based Stapled Peptide journal September 2018
A computational system for identifying operons based on RNA-seq data journal April 2020
SMRT-Cappable-seq reveals complex operon variants in bacteria journal September 2018
Stress-induced inactivation of the Staphylococcus aureus purine biosynthesis repressor leads to hypervirulence journal February 2019
Dual functionality of the amyloid protein TasA in Bacillus physiology and fitness on the phylloplane journal April 2020
Comparison of Bacillus subtilis transcriptome profiles from two separate missions to the International Space Station journal January 2019
Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype journal August 2019
Using Machine Learning to Measure Relatedness Between Genes: A Multi-Features Model journal March 2019
Detecting operons in bacterial genomes via visual representation learning journal January 2021
Transcriptome landscape of Synechococcus elongatus PCC 7942 for nitrogen starvation responses using RNA-seq journal August 2016
Operons in Escherichia coli: Genomic analyses and predictions journal May 2000
Transcriptome landscape of a bacterial pathogen under plant immunity journal March 2018
Noncontiguous operon is a genetic organization for coordinating bacterial gene expression journal January 2019
The purine biosynthesis regulator PurR moonlights as a virulence regulator in Staphylococcus aureus journal June 2019
Deep learning for inferring gene relationships from single-cell expression data journal December 2019
Expression Analysis of the nrdHIEF Operon fromEscherichia coli journal May 2001
DOOR: a prokaryotic operon database for genome analyses and functional inference journal July 2017
BEDTools: a flexible suite of utilities for comparing genomic features journal January 2010
Trimmomatic: a flexible trimmer for Illumina sequence data journal April 2014
CONDOP: an R package for CONdition-Dependent Operon Predictions journal June 2016
Operon-mapper: a web server for precise operon identification in bacterial and archaeal genomes journal June 2018
Type IV pili promote early biofilm formation by Clostridium difficile journal June 2016
Diversity, versatility and complexity of bacterial gene regulation mechanisms: opportunities and drawbacks for applications in synthetic biology journal February 2019
Independent and tight regulation of transcriptional units in Escherichia coli via the LacR/O, the TetR/O and AraC/I1-I2 regulatory elements journal March 1997
Prediction of condition-specific regulatory genes using machine learning journal April 2020
Co-expression pattern from DNA microarray experiments as a tool for operon prediction journal July 2002
Conservation of adjacency as evidence of paralogous operons journal October 2004
A universally applicable method of operon map prediction on minimally annotated genomes using conserved genomic context journal June 2005
Operon prediction using both genome-specific and general genomic information journal December 2006
MicrobesOnline: an integrated portal for comparative and functional genomics journal November 2009
ProOpDB: Prokaryotic Operon DataBase journal November 2011
DOOR 2.0: presenting operons and their functions through dynamic and integrated views journal November 2013
Systems assessment of transcriptional regulation on central carbon metabolism by Cra and CRP journal January 2018
Proteomic and transcriptomic experiments reveal an essential role of RNA degradosome complexes in shaping the transcriptome of Mycobacterium tuberculosis journal April 2019
Computational Identification of Operons in Microbial Genomes journal October 2001
Ensemble supervised learning for genomic selection conference November 2019
New Ensemble Machine Learning Method for Classification and Prediction on Gene Expression Data
  • Wang, Ching Wei
  • Conference Proceedings. Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2006 International Conference of the IEEE Engineering in Medicine and Biology Society https://doi.org/10.1109/IEMBS.2006.259893
conference August 2006
A novel family of functional operons encoding methane/ammonia monooxygenase-related proteins in gammaproteobacterial methanotrophs: Novel monooxygenase in Gamma-MOB journal February 2011
The Staphylococcus aureus ArlRS two‐component system regulates virulence factor expression through MgrA journal November 2019
Global Transcriptional Response of Clostridium difficile Carrying the ϕCD38-2 Prophage journal February 2015
Tight regulation, modulation, and high-level expression by vectors containing the arabinose PBAD promoter. journal July 1995
Transcriptional Regulation and Mechanism of SigN (ZpdN), a pBS32-Encoded Sigma Factor in Bacillus subtilis journal October 2019
Effect of tcdR Mutation on Sporulation in the Epidemic Clostridium difficile Strain R20291 journal February 2017
Transcriptome dynamics-based operon prediction in prokaryotes journal May 2014
Machine learning applied to transcriptomic data to identify genes associated with feed efficiency in pigs journal March 2019
Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes journal February 2020
Perturbation of the two-component signal transduction system, BprRS, results in attenuated virulence and motility defects in Burkholderia pseudomallei journal May 2016
Predicting gene regulatory interactions based on spatial gene expression data and deep learning journal September 2019
A systematic pipeline for classifying bacterial operons reveals the evolutionary landscape of biofilm machineries journal April 2020
Extensive reshaping of bacterial operons by programmed mRNA decay journal April 2018
An Organic Acid Based Counter Selection System for Cyanobacteria journal October 2013
Differential Stress Transcriptome Landscape of Historic and Recently Emerged Hypervirulent Strains of Clostridium difficile Strains Determined Using RNA-seq journal November 2013
Mild hydrostatic pressure triggers oxidative responses in Escherichia coli journal July 2018
RT-PCR: Characterization of Long Multi-Gene Operons and Multiple Transcript Gene Clusters in Bacteria journal November 1999
rSeqTU—A Machine-Learning Based R Package for Prediction of Bacterial Transcription Units journal May 2019
Elucidating the Influence of Chromosomal Architecture on Transcriptional Regulation in Prokaryotes – Observing Strong Local Effects of Nucleoid Structure on Gene Regulation journal September 2020

Similar Records

Operon prediction in Pyrococcus furiosus
Journal Article · Mon Dec 04 23:00:00 EST 2006 · Nucleic Acids Research · OSTI ID:1625415

Related Subjects