skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: The Escherichia coli transcriptome mostly consists of independently regulated modules

Abstract

Underlying cellular responses is a transcriptional regulatory network (TRN) that modulates gene expression. A useful description of the TRN would decompose the transcriptome into targeted effects of individual transcriptional regulators. Here, we apply unsupervised machine learning to a diverse compendium of over 250 high-quality Escherichia coli RNA-seq datasets to identify 92 statistically independent signals that modulate the expression of specific gene sets. We show that 61 of these transcriptomic signals represent the effects of currently characterized transcriptional regulators. Condition-specific activation of signals is validated by exposure of E. coli to new environmental conditions. The resulting decomposition of the transcriptome provides: a mechanistic, systems-level, network-based explanation of responses to environmental and genetic perturbations; a guide to gene and regulator function discovery; and a basis for characterizing transcriptomic differences in multiple strains. Taken together, our results show that signal summation describes the composition of a model prokaryotic transcriptome.

Authors:
ORCiD logo [1];  [1];  [1];  [1]; ORCiD logo [1];  [1]; ORCiD logo [1];  [1]; ORCiD logo [1]; ORCiD logo [2]
  1. Univ. of California San Diego, La Jolla, CA (United States)
  2. Univ. of California San Diego, La Jolla, CA (United States); Novo Nordisk Foundation Center for Biosustainability, Lyngby (Denmark)
Publication Date:
Research Org.:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). National Energy Research Scientific Computing Center (NERSC)
Sponsoring Org.:
USDOE Office of Science (SC); Novo Nordisk Foundation Center for Biosustainability
OSTI Identifier:
1624220
Grant/Contract Number:  
AC02-05CH11231; NNF10CC1016517
Resource Type:
Accepted Manuscript
Journal Name:
Nature Communications
Additional Journal Information:
Journal Volume: 10; Journal Issue: 1; Journal ID: ISSN 2041-1723
Publisher:
Nature Publishing Group
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; science & technology - other topics; bacterial systems biology; data processing; gene regulatory networks; machine learning; regulatory networks

Citation Formats

Sastry, Anand V., Gao, Ye, Szubin, Richard, Hefner, Ying, Xu, Sibei, Kim, Donghyuk, Choudhary, Kumari Sonal, Yang, Laurence, King, Zachary A., and Palsson, Bernhard O.. The Escherichia coli transcriptome mostly consists of independently regulated modules. United States: N. p., 2019. Web. https://doi.org/10.1038/s41467-019-13483-w.
Sastry, Anand V., Gao, Ye, Szubin, Richard, Hefner, Ying, Xu, Sibei, Kim, Donghyuk, Choudhary, Kumari Sonal, Yang, Laurence, King, Zachary A., & Palsson, Bernhard O.. The Escherichia coli transcriptome mostly consists of independently regulated modules. United States. https://doi.org/10.1038/s41467-019-13483-w
Sastry, Anand V., Gao, Ye, Szubin, Richard, Hefner, Ying, Xu, Sibei, Kim, Donghyuk, Choudhary, Kumari Sonal, Yang, Laurence, King, Zachary A., and Palsson, Bernhard O.. Wed . "The Escherichia coli transcriptome mostly consists of independently regulated modules". United States. https://doi.org/10.1038/s41467-019-13483-w. https://www.osti.gov/servlets/purl/1624220.
@article{osti_1624220,
title = {The Escherichia coli transcriptome mostly consists of independently regulated modules},
author = {Sastry, Anand V. and Gao, Ye and Szubin, Richard and Hefner, Ying and Xu, Sibei and Kim, Donghyuk and Choudhary, Kumari Sonal and Yang, Laurence and King, Zachary A. and Palsson, Bernhard O.},
abstractNote = {Underlying cellular responses is a transcriptional regulatory network (TRN) that modulates gene expression. A useful description of the TRN would decompose the transcriptome into targeted effects of individual transcriptional regulators. Here, we apply unsupervised machine learning to a diverse compendium of over 250 high-quality Escherichia coli RNA-seq datasets to identify 92 statistically independent signals that modulate the expression of specific gene sets. We show that 61 of these transcriptomic signals represent the effects of currently characterized transcriptional regulators. Condition-specific activation of signals is validated by exposure of E. coli to new environmental conditions. The resulting decomposition of the transcriptome provides: a mechanistic, systems-level, network-based explanation of responses to environmental and genetic perturbations; a guide to gene and regulator function discovery; and a basis for characterizing transcriptomic differences in multiple strains. Taken together, our results show that signal summation describes the composition of a model prokaryotic transcriptome.},
doi = {10.1038/s41467-019-13483-w},
journal = {Nature Communications},
number = 1,
volume = 10,
place = {United States},
year = {2019},
month = {12}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 3 works
Citation information provided by
Web of Science

Figures / Tables:

Fig. 1 Fig. 1: ICA extracts regulatory signals from expression data. a Given three microphones recording three people speaking simultaneously, each microphone records each voice (i.e. signal) at different volumes (i.e. signal strengths) based on their relative distances. Using only these measured mixed signals, ICA recovers the original signals and their relativemore » signal strengths by maximizing the statistical independence of the recovered signals. The mixed signals (X) are a linear combination of the matrix of recovered source signals (S) and the mixing matrix (A) that represents the relative strength of each source signal in the mixed output signals. This relationship is mathematically described as X= SA. b An expression profile under a specific condition can be likened to a microphone in a cell, measuring the combined effects of all transcriptional regulators. c Schematic illustration of ICA applied to a gene expression compendium. See Supplementary Fig. 1a, b for additional details on data quality. The example TF is a dual regulator that primarily upregulates genes, and is activated by the green circular metabolite. Example experimental conditions shown are a TF knock-out, wild-type, and wild-type grown on medium supplemented with the activating metabolite. Each column of X contains an individual expression profile across 3923 genes in E. coli. d Each component (column of S) contains a coefficient for each gene. These coefficients are scaled by the component’s condition-specific activities (row in A) to form the component’s contribution to the transcriptomic compendium e. The sum of the contributions from the 92 components reconstructs most of the variance in the original compendium. f Independent components are converted into i-modulons by removing all genes with coefficients within a significance threshold (indicated in gray). Significant genes may have either positive (red) or negative (blue) coefficients. g Distribution of i-modulon categories. Categories of regulatory i-modulons are labeled in bold font. Genomic i-modulons account for single gene knock-outs, large deletions or duplications of genomic regions. Biological i-modulons contain genes enriched for a specific function, but are not linked to a specific transcriptional regulator. For more information, see Supplementary Fig. 1 and Supplementary Table 1.« less

Save / Share:

Works referenced in this record:

The Mycobacterium tuberculosis regulatory network and hypoxia
journal, July 2013

  • Galagan, James E.; Minch, Kyle; Peterson, Matthew
  • Nature, Vol. 499, Issue 7457
  • DOI: 10.1038/nature12337

Global Network Reorganization During Dynamic Adaptations of Bacillus subtilis Metabolism
journal, March 2012


RegulonDB version 9.0: high-level integration of gene regulation, coexpression, motif clustering and beyond
journal, November 2015

  • Gama-Castro, Socorro; Salgado, Heladia; Santos-Zavaleta, Alberto
  • Nucleic Acids Research, Vol. 44, Issue D1
  • DOI: 10.1093/nar/gkv1156

A unified resource for transcriptional regulation in Escherichia coli K-12 incorporating high-throughput-generated binding data into RegulonDB version 10.0
journal, August 2018

  • Santos-Zavaleta, Alberto; Sánchez-Pérez, Mishael; Salgado, Heladia
  • BMC Biology, Vol. 16, Issue 1
  • DOI: 10.1186/s12915-018-0555-y

Architecture of the human regulatory network derived from ENCODE data
journal, September 2012

  • Gerstein, Mark B.; Kundaje, Anshul; Hariharan, Manoj
  • Nature, Vol. 489, Issue 7414
  • DOI: 10.1038/nature11245

Comparing genomes to computer operating systems in terms of the topology and evolution of their regulatory control networks
journal, May 2010

  • Yan, K. -K.; Fang, G.; Bhardwaj, N.
  • Proceedings of the National Academy of Sciences, Vol. 107, Issue 20
  • DOI: 10.1073/pnas.0914771107

Transcription factors: An overview
journal, December 1997


Wisdom of crowds for robust gene network inference
journal, July 2012

  • Marbach, Daniel; Costello, James C.; Küffner, Robert
  • Nature Methods, Vol. 9, Issue 8
  • DOI: 10.1038/nmeth.2016

Advantages and limitations of current network inference methods
journal, August 2010

  • De Smet, Riet; Marchal, Kathleen
  • Nature Reviews Microbiology, Vol. 8, Issue 10
  • DOI: 10.1038/nrmicro2419

A comprehensive evaluation of module detection methods for gene expression data
journal, March 2018


Independent component analysis, A new concept?
journal, April 1994


A review of independent component analysis application to microarray gene expression data
journal, November 2008

  • Kong, Wei; Vanderburg, Charles R.; Gunshin, Hiromi
  • BioTechniques, Vol. 45, Issue 5
  • DOI: 10.2144/000112950

Linear modes of gene expression determined by independent component analysis
journal, January 2002


Blind Source Separation and the Analysis of Microarray Data
journal, December 2004

  • Chiappetta, P.; Roubaud, M. C.; Torrésani, B.
  • Journal of Computational Biology, Vol. 11, Issue 6
  • DOI: 10.1089/cmb.2004.11.1090

A decomposition model to track gene expression signatures: preview on observer-independent classification of ovarian cancer
journal, December 2002


Elucidating the Altered Transcriptional Programs in Breast Cancer using Independent Component Analysis
journal, August 2007


Independent component analysis: Mining microarray data for fundamental human gene expression modules
journal, December 2010

  • Engreitz, Jesse M.; Daigle, Bernie J.; Marshall, Jonathan J.
  • Journal of Biomedical Informatics, Vol. 43, Issue 6
  • DOI: 10.1016/j.jbi.2010.07.001

Singular value decomposition for genome-wide expression data processing and modeling
journal, August 2000

  • Alter, O.; Brown, P. O.; Botstein, D.
  • Proceedings of the National Academy of Sciences, Vol. 97, Issue 18, p. 10101-10106
  • DOI: 10.1073/pnas.97.18.10101

Independent Component Analysis Uncovers the Landscape of the Bladder Tumor Transcriptome and Reveals Insights into Luminal and Basal Subtypes
journal, November 2014


Molecular diagnosis of human cancer type by gene expression profiles and independent component analysis
journal, October 2005

  • Zhang, Xue Wu; Yap, Yee Leng; Wei, Dong
  • European Journal of Human Genetics, Vol. 13, Issue 12
  • DOI: 10.1038/sj.ejhg.5201495

Coherent Functional Modules Improve Transcription Factor Target Identification, Cooperativity Prediction, and Disease Association
journal, February 2014


COLOMBOS v3.0: leveraging gene expression compendia for cross-species analyses: Table 1.
journal, November 2015

  • Moretto, Marco; Sonego, Paolo; Dierckxsens, Nicolas
  • Nucleic Acids Research, Vol. 44, Issue D1
  • DOI: 10.1093/nar/gkv1251

Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles
journal, January 2007


Multi-omics integration accurately predicts cellular state in unexplored conditions for Escherichia coli
journal, October 2016

  • Kim, Minseung; Rai, Navneet; Zorraquino, Violeta
  • Nature Communications, Vol. 7, Issue 1
  • DOI: 10.1038/ncomms13090

Gene Expression Profiling and the Use of Genome-Scale In Silico Models of Escherichia coli for Analysis: Providing Context for Content
journal, April 2009

  • Lewis, N. E.; Cho, B. -K.; Knight, E. M.
  • Journal of Bacteriology, Vol. 191, Issue 11
  • DOI: 10.1128/JB.00034-09

Normalization of RNA-seq data using factor analysis of control genes or samples
journal, August 2014

  • Risso, Davide; Ngai, John; Speed, Terence P.
  • Nature Biotechnology, Vol. 32, Issue 9
  • DOI: 10.1038/nbt.2931

Tackling the widespread and critical impact of batch effects in high-throughput data
journal, September 2010

  • Leek, Jeffrey T.; Scharpf, Robert B.; Bravo, Héctor Corrada
  • Nature Reviews Genetics, Vol. 11, Issue 10
  • DOI: 10.1038/nrg2825

Global transcriptional regulatory network for Escherichia coli robustly connects gene expression to transcription factor activities
journal, September 2017

  • Fang, Xin; Sastry, Anand; Mih, Nathan
  • Proceedings of the National Academy of Sciences, Vol. 114, Issue 38
  • DOI: 10.1073/pnas.1702581114

RNA-Seq: a revolutionary tool for transcriptomics
journal, January 2009

  • Wang, Zhong; Gerstein, Mark; Snyder, Michael
  • Nature Reviews Genetics, Vol. 10, Issue 1
  • DOI: 10.1038/nrg2484

NCBI GEO: archive for functional genomics data sets—update
journal, November 2012

  • Barrett, Tanya; Wilhite, Stephen E.; Ledoux, Pierre
  • Nucleic Acids Research, Vol. 41, Issue D1
  • DOI: 10.1093/nar/gks1193

The riboswitch control of bacterial metabolism
journal, January 2004


Attenuation in Amino Acid Biosynthetic Operons
journal, December 1982


ChIP‐exo Method for Identifying Genomic Location of DNA‐Binding Proteins with Near‐Single‐Nucleotide Accuracy
journal, October 2012


The PurR regulon in Escherichia coli K-12 MG1655
journal, May 2011

  • Cho, Byung-Kwan; Federowicz, Stephen A.; Embree, Mallory
  • Nucleic Acids Research, Vol. 39, Issue 15
  • DOI: 10.1093/nar/gkr307

BtsT, a Novel and Specific Pyruvate/H+ Symporter in Escherichia coli
journal, October 2017

  • Kristoficova, Ivica; Vilhena, Cláudia; Behr, Stefan
  • Journal of Bacteriology, Vol. 200, Issue 2
  • DOI: 10.1128/JB.00599-17

Regulation of Pyrimidine Biosynthetic Gene Expression in Bacteria: Repression without Repressors
journal, June 2008

  • Turnbough, Charles L.; Switzer, Robert L.
  • Microbiology and Molecular Biology Reviews, Vol. 72, Issue 2
  • DOI: 10.1128/MMBR.00001-08

E. coli gene regulatory networks are inconsistent with gene expression data
journal, November 2018

  • Larsen, Simon J.; Röttger, Richard; Schmidt, Harald H. H. W.
  • Nucleic Acids Research, Vol. 47, Issue 1
  • DOI: 10.1093/nar/gky1176

Systematic discovery of uncharacterized transcription factors in Escherichia coli K-12 MG1655
journal, August 2018

  • Gao, Ye; Yurkovich, James T.; Seo, Sang Woo
  • Nucleic Acids Research
  • DOI: 10.1093/nar/gky752

Reframing gene essentiality in terms of adaptive flexibility
journal, December 2018


Pseudogene repair driven by selection pressure applied in experimental evolution
journal, January 2019


Enzyme promiscuity shapes adaptation to novel growth substrates
journal, April 2019

  • Guzmán, Gabriela I.; Sandberg, Troy E.; LaCroix, Ryan A.
  • Molecular Systems Biology, Vol. 15, Issue 4
  • DOI: 10.15252/msb.20188462

Transcription factor CecR (YbiH) regulates a set of genes affecting the sensitivity of Escherichia coli against cefoperazone and chloramphenicol
journal, July 2016

  • Yamanaka, Yuki; Shimada, Tomohiro; Yamamoto, Kaneyoshi
  • Microbiology, Vol. 162, Issue 7
  • DOI: 10.1099/mic.0.000292

ALEdb 1.0: a database of mutations from adaptive laboratory evolution experimentation
journal, October 2018

  • Phaneuf, Patrick V.; Gosting, Dennis; Palsson, Bernhard O.
  • Nucleic Acids Research, Vol. 47, Issue D1
  • DOI: 10.1093/nar/gky983

Assignment of function to a domain of unknown function: DUF1537 is a new kinase family in catabolic pathways for acid sugars
journal, July 2016

  • Zhang, Xinshuai; Carter, Michael S.; Vetting, Matthew W.
  • Proceedings of the National Academy of Sciences, Vol. 113, Issue 29
  • DOI: 10.1073/pnas.1605546113

Use of Adaptive Laboratory Evolution To Discover Key Mutations Enabling Rapid Growth of Escherichia coli K-12 MG1655 on Glucose Minimal Medium
journal, October 2014

  • LaCroix, Ryan A.; Sandberg, Troy E.; O'Brien, Edward J.
  • Applied and Environmental Microbiology, Vol. 81, Issue 1
  • DOI: 10.1128/AEM.02246-14

Global Rebalancing of Cellular Resources by Pleiotropic Point Mutations Illustrates a Multi-scale Mechanism of Adaptive Evolution
journal, April 2016


Interdependence of Cell Growth and Gene Expression: Origins and Consequences
journal, November 2010


Emergence of robust growth laws from optimal regulation of ribosome synthesis
journal, August 2014

  • Scott, Matthew; Klumpp, Stefan; Mateescu, Eduard M.
  • Molecular Systems Biology, Vol. 10, Issue 8
  • DOI: 10.15252/msb.20145379

Escherichia coli achieves faster growth by increasing catalytic and translation rates of proteins
journal, January 2013

  • Valgepea, Kaspar; Adamberg, Kaarel; Seiman, Andrus
  • Molecular BioSystems, Vol. 9, Issue 9
  • DOI: 10.1039/c3mb70119k

Quantification and Classification of E. coli Proteome Utilization and Unused Protein Costs across Environments
journal, June 2016


Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants the Keio collection
journal, February 2006

  • Baba, Tomoya; Ara, Takeshi; Hasegawa, Miki
  • Molecular Systems Biology, Vol. 2, Article No. 2006.0008
  • DOI: 10.1038/msb4100050

Complete Genome Sequence of Escherichia coli BW25113
journal, September 2014


Multi-omics Quantification of Species Variation of Escherichia coli Links Molecular Features with Strain Phenotypes
journal, September 2016


A comparative study of variation in codon 33 of the rpoS gene in Escherichia coli K12 stocks: implications for the synthesis of σs
journal, November 2003


The plasticity of global proteome and genome expression analyzed in closely related W3110 and MG1655 strains of a well-studied model organism, Escherichia coli-K12
journal, March 2007


Genome-scale reconstruction of the sigma factor network in Escherichia coli: topology and functional states
journal, January 2014


Deciphering Fur transcriptional regulatory network highlights its complex role beyond iron metabolism in Escherichia coli
journal, September 2014

  • Seo, Sang Woo; Kim, Donghyuk; Latif, Haythem
  • Nature Communications, Vol. 5, Issue 1
  • DOI: 10.1038/ncomms5910

Ultrafast and memory-efficient alignment of short DNA sequences to the human genome
journal, January 2009


MACE: model based analysis of ChIP-exo
journal, September 2014

  • Wang, Liguo; Chen, Junsheng; Wang, Chen
  • Nucleic Acids Research, Vol. 42, Issue 20
  • DOI: 10.1093/nar/gku846

Software for Computing and Annotating Genomic Ranges
journal, August 2013


Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2
journal, December 2014


The EcoCyc database: reflecting new knowledge about Escherichia coli K-12
journal, November 2016

  • Keseler, Ingrid M.; Mackie, Amanda; Santos-Zavaleta, Alberto
  • Nucleic Acids Research, Vol. 45, Issue D1
  • DOI: 10.1093/nar/gkw1003

Fast and robust fixed-point algorithms for independent component analysis
journal, May 1999

  • Hyvarinen, A.
  • IEEE Transactions on Neural Networks, Vol. 10, Issue 3
  • DOI: 10.1109/72.761722

An open graph visualization system and its applications to software engineering
journal, January 2000


EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis
journal, March 2004


A comparison of normalization methods for high density oligonucleotide array data based on variance and bias
journal, January 2003


Sparse Principal Component Analysis
journal, June 2006

  • Zou, Hui; Hastie, Trevor; Tibshirani, Robert
  • Journal of Computational and Graphical Statistics, Vol. 15, Issue 2
  • DOI: 10.1198/106186006X113430

Promotion of RNA transcription on the insertion element IS30 of E. coli K12.
journal, October 1985


An open graph visualization system and its applications to software engineering
journal, September 2000


Coherent Functional Modules Improve Transcription Factor Target Identification, Cooperativity Prediction, and Disease Association
text, January 2014

  • Karczewski, Konrad J.; Snyder, Michael; Altman, Russ B.
  • Columbia University
  • DOI: 10.7916/d84q7v83

    Figures/Tables have been extracted from DOE-funded journal article accepted manuscripts.