Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Detecting operons in bacterial genomes via visual representation learning

Journal Article · · Scientific Reports
 [1];  [2];  [3]
  1. Univ. of Chicago, IL (United States)
  2. Argonne National Lab. (ANL), Argonne, IL (United States)
  3. Univ. of Chicago, IL (United States). Consortium for Advanced Science and Engineering; Argonne National Lab. (ANL), Argonne, IL (United States)
Contiguous genes in prokaryotes are often arranged into operons. Detecting operons plays a critical role in inferring gene functionality and regulatory networks. Human experts annotate operons by visually inspecting gene neighborhoods across pileups of related genomes. These visual representations capture the inter-genic distance, strand direction, gene size, functional relatedness, and gene neighborhood conservation, which are the most prominent operon features mentioned in the literature. By studying these features, an expert can then decide whether a genomic region is part of an operon. We propose a deep learning based method named Operon Hunter that uses visual representations of genomic fragments to make operon predictions. Using transfer learning and data augmentation techniques facilitates leveraging the powerful neural networks trained on image datasets by re-training them on a more limited dataset of extensively validated operons. Our method outperforms the previously reported state-of-the-art tools, especially when it comes to predicting full operons and their boundaries accurately. Furthermore, our approach makes it possible to visually identify the features influencing the network’s decisions to be subsequently cross-checked by human experts.
Research Organization:
Argonne National Laboratory (ANL), Argonne, IL (United States)
Sponsoring Organization:
USDOE; National Institute of Allergy and Infectious Diseases (NIAID); National Institutes of Health (NIH)
Grant/Contract Number:
AC02-06CH11357
OSTI ID:
1785368
Journal Information:
Scientific Reports, Journal Name: Scientific Reports Journal Issue: 1 Vol. 11; ISSN 2045-2322
Publisher:
Nature Publishing GroupCopyright Statement
Country of Publication:
United States
Language:
English

References (36)

Detection of operons journal June 2006
ImageNet Large Scale Visual Recognition Challenge journal April 2015
A multi-approaches-guided genetic algorithm with application to operon prediction journal October 2007
Operon prediction based on SVM journal June 2006
A universal SNP and small-indel variant caller using deep neural networks journal September 2018
Operons in Escherichia coli: Genomic analyses and predictions journal May 2000
The use of gene clusters to infer functional coupling journal March 1999
The relative value of operon predictions journal April 2008
Modeling and predicting transcriptional units of <$O_SSF>Escherichia coli<$C_SSF>genes using hidden Markov models journal December 1999
A historical perspective on gene/protein functional assignment journal January 2000
Identifying operons and untranslated regions of transcripts using Escherichia coli RNA expression analysis journal July 2002
A Bayesian network approach to operon prediction journal July 2003
Using functional and organizational information to improve genome-wide computational prediction of transcription units on pathway-genome databases journal January 2004
Operon prediction without a training set journal November 2004
A fuzzy guided genetic algorithm for operon prediction journal November 2004
Operon-mapper: a web server for precise operon identification in bacterial and archaeal genomes journal June 2018
Prediction of operons in microbial genomes journal March 2001
A universally applicable method of operon map prediction on minimally annotated genomes using conserved genomic context journal June 2005
Operon prediction using both genome-specific and general genomic information journal December 2006
Operon prediction in Pyrococcus furiosus journal December 2006
DBTBS: a database of transcriptional regulation in Bacillus subtilis containing upstream intergenic conservation information journal October 2007
ODB: a database for operon organizations, 2011 update journal November 2010
High accuracy operon prediction method based on STRING database scores journal April 2010
ProOpDB: Prokaryotic Operon DataBase journal November 2011
DOOR 2.0: presenting operons and their functions through dynamic and integrated views journal November 2013
STRING v10: protein–protein interaction networks, integrated over the tree of life journal October 2014
Improvements to PATRIC, the all-bacterial Bioinformatics Database and Analysis Resource Center journal November 2016
RegulonDB v 10.5: tackling challenges to unify classic and high throughput knowledge of gene regulation in E. coli K-12 journal November 2018
Computational Identification of Operons in Microbial Genomes journal October 2001
Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization conference October 2017
Operon Prediction for Sequenced Bacterial Genomes without Experimental Information journal February 2007
Transcriptome dynamics-based operon prediction in prokaryotes journal May 2014
Functional analysis of an intergenic non-coding sequence within mce1 operon of M.tuberculosis journal January 2010
Direct Speech-to-Speech Translation with a Sequence-to-Sequence Model conference September 2019
Mechanisms of antibiotic resistance in Staphylococcus aureus journal June 2007
PATtyFams: Protein Families for the Microbial Genomes in the PATRIC Database journal February 2016

Cited By (26)

Reanalysis and optimisation of bioinformatic pipelines is critical for mutation detection journal January 2019
Stem cell motion-tracking by using deep neural networks with multi-output journal November 2017
A multi-task convolutional deep neural network for variant calling in single molecule sequencing journal March 2019
Deep convolutional neural networks for accurate somatic mutation detection journal March 2019
Longshot enables accurate variant calling in diploid genomes from single-molecule long read sequencing journal October 2019
Deep learning: new computational modelling techniques for genomics journal April 2019
An open resource for accurately benchmarking small variant and reference calls journal April 2019
The Kipoi repository accelerates community exchange and reuse of predictive models for genomics journal May 2019
Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome journal August 2019
Accurate detection of mosaic variants in sequencing data without matched controls journal January 2020
SICaRiO: short indel call filtering with boosting journal October 2020
GenomeWarp: an alignment-based variant coordinate transformation journal March 2019
Genomic diversity affects the accuracy of bacterial single-nucleotide polymorphism–calling pipelines journal February 2020
Interpretable detection of novel human viruses from genome sequencing data journal February 2021
De novo Nanopore read quality improvement using deep learning journal November 2019
VariFAST: a variant filter by automated scoring based on tagged-signatures journal December 2019
Amino acid encoding for deep learning applications journal June 2020
GeDi: applying suffix arrays to increase the repertoire of detectable SNVs in tumour genomes journal February 2020
Identifying genomic islands with deep neural networks journal June 2021
Machine learning and complex biological data journal April 2019
Artificial intelligence in clinical and genomic diagnostics journal November 2019
Kekulescope: Prediction of cancer cell line sensitivity and compound potency using convolutional neural networks trained on compound images text January 2019
Integration of Random Forest Classifiers and Deep Convolutional Neural Networks for Classification and Biomolecular Modeling of Cancer Driver Mutations journal June 2019
Applications and Challenges of Machine Learning to Enable Realistic Cellular Simulations journal January 2020
Sparse Convolutional Denoising Autoencoders for Genotype Imputation journal August 2019
Haplotype-aware diplotyping from noisy long reads text January 2018

Similar Records

A Novel Method for Accurate Operon Predictions in All SequencedProkaryotes
Journal Article · Tue Nov 30 23:00:00 EST 2004 · Nucleic Acids Research · OSTI ID:859714

The Life-cycle of Operons
Journal Article · Thu Mar 15 00:00:00 EDT 2007 · PLoS Genetics · OSTI ID:922706

The Life-cycle of Operons
Journal Article · Thu Nov 17 23:00:00 EST 2005 · PLoS Genetics · OSTI ID:889264

Related Subjects