skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Exploiting regulatory heterogeneity to systematically identify enhancers with high accuracy

Abstract

Identifying functional enhancer elements in metazoan systems is a major challenge. Large-scale validation of enhancers predicted by ENCODE reveal false-positive rates of at least 70%. We used the pregrastrula-patterning network of Drosophila melanogaster to demonstrate that loss in accuracy in held-out data results from heterogeneity of functional signatures in enhancer elements. We show that at least two classes of enhancers are active during early Drosophila embryogenesis and that by focusing on a single, relatively homogeneous class of elements, greater than 98% prediction accuracy can be achieved in a balanced, completely held-out test set. The class of well-predicted elements is composed predominantly of enhancers driving multistage segmentation patterns, which we designate segmentation driving enhancers (SDE). Prediction is driven by the DNA occupancy of early developmental transcription factors, with almost no additional power derived from histone modifications. We further show that improved accuracy is not a property of a particular prediction method: after conditioning on the SDE set, naïve Bayes and logistic regression perform as well as more sophisticated tools. Applying this method to a genome-wide scan, we predict 1,640 SDEs that cover 1.6% of the genome. An analysis of 32 SDEs using whole-mount embryonic imaging of stably integrated reporter constructs chosenmore » throughout our prediction rank-list showed >90% drove expression patterns. We achieved 86.7% precision on a genome-wide scan, with an estimated recall of at least 98%, indicating high accuracy and completeness in annotating this class of functional elements.« less

Authors:
 [1];  [2];  [3];  [3];  [3];  [3];  [3];  [3];  [3];  [3]; ORCiD logo [4];  [4];  [3];  [4];  [5]
  1. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Univ. of California, Berkeley, CA (United States)
  2. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Univ. of California, Berkeley, CA (United States); Cornell Univ., Ithaca, NY (United States)
  3. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
  4. Univ. of California, Berkeley, CA (United States)
  5. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Univ. of California, Berkeley, CA (United States); Univ. of Birmingham (United Kingdom)
Publication Date:
Research Org.:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
1559177
DOE Contract Number:  
AC02-05CH11231
Resource Type:
Journal Article
Journal Name:
Proceedings of the National Academy of Sciences of the United States of America
Additional Journal Information:
Journal Volume: 116; Journal Issue: 3; Journal ID: ISSN 0027-8424
Publisher:
National Academy of Sciences, Washington, DC (United States)
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; 54 ENVIRONMENTAL SCIENCES; enhancers; embryo development; machine learning; random forest; Drosophila

Citation Formats

Arbel, Hamutal, Basu, Sumanta, Fisher, William W., Hammonds, Ann S., Wan, Kenneth H., Park, Soo, Weiszmann, Richard, Booth, Benjamin W., Keranen, Soile V., Henriquez, Clara, Shams Solari, Omid, Bickel, Peter J., Biggin, Mark D., Celniker, Susan E., and Brown, James B. Exploiting regulatory heterogeneity to systematically identify enhancers with high accuracy. United States: N. p., 2018. Web. doi:10.1073/pnas.1808833115.
Arbel, Hamutal, Basu, Sumanta, Fisher, William W., Hammonds, Ann S., Wan, Kenneth H., Park, Soo, Weiszmann, Richard, Booth, Benjamin W., Keranen, Soile V., Henriquez, Clara, Shams Solari, Omid, Bickel, Peter J., Biggin, Mark D., Celniker, Susan E., & Brown, James B. Exploiting regulatory heterogeneity to systematically identify enhancers with high accuracy. United States. doi:10.1073/pnas.1808833115.
Arbel, Hamutal, Basu, Sumanta, Fisher, William W., Hammonds, Ann S., Wan, Kenneth H., Park, Soo, Weiszmann, Richard, Booth, Benjamin W., Keranen, Soile V., Henriquez, Clara, Shams Solari, Omid, Bickel, Peter J., Biggin, Mark D., Celniker, Susan E., and Brown, James B. Mon . "Exploiting regulatory heterogeneity to systematically identify enhancers with high accuracy". United States. doi:10.1073/pnas.1808833115. https://www.osti.gov/servlets/purl/1559177.
@article{osti_1559177,
title = {Exploiting regulatory heterogeneity to systematically identify enhancers with high accuracy},
author = {Arbel, Hamutal and Basu, Sumanta and Fisher, William W. and Hammonds, Ann S. and Wan, Kenneth H. and Park, Soo and Weiszmann, Richard and Booth, Benjamin W. and Keranen, Soile V. and Henriquez, Clara and Shams Solari, Omid and Bickel, Peter J. and Biggin, Mark D. and Celniker, Susan E. and Brown, James B.},
abstractNote = {Identifying functional enhancer elements in metazoan systems is a major challenge. Large-scale validation of enhancers predicted by ENCODE reveal false-positive rates of at least 70%. We used the pregrastrula-patterning network of Drosophila melanogaster to demonstrate that loss in accuracy in held-out data results from heterogeneity of functional signatures in enhancer elements. We show that at least two classes of enhancers are active during early Drosophila embryogenesis and that by focusing on a single, relatively homogeneous class of elements, greater than 98% prediction accuracy can be achieved in a balanced, completely held-out test set. The class of well-predicted elements is composed predominantly of enhancers driving multistage segmentation patterns, which we designate segmentation driving enhancers (SDE). Prediction is driven by the DNA occupancy of early developmental transcription factors, with almost no additional power derived from histone modifications. We further show that improved accuracy is not a property of a particular prediction method: after conditioning on the SDE set, naïve Bayes and logistic regression perform as well as more sophisticated tools. Applying this method to a genome-wide scan, we predict 1,640 SDEs that cover 1.6% of the genome. An analysis of 32 SDEs using whole-mount embryonic imaging of stably integrated reporter constructs chosen throughout our prediction rank-list showed >90% drove expression patterns. We achieved 86.7% precision on a genome-wide scan, with an estimated recall of at least 98%, indicating high accuracy and completeness in annotating this class of functional elements.},
doi = {10.1073/pnas.1808833115},
journal = {Proceedings of the National Academy of Sciences of the United States of America},
issn = {0027-8424},
number = 3,
volume = 116,
place = {United States},
year = {2018},
month = {11}
}

Works referenced in this record:

Area under Precision-Recall Curves for Weighted and Unweighted Data
journal, March 2014


Tools for neuroanatomy and neurogenetics in Drosophila
journal, July 2008

  • Pfeiffer, B. D.; Jenett, A.; Hammonds, A. S.
  • Proceedings of the National Academy of Sciences, Vol. 105, Issue 28
  • DOI: 10.1073/pnas.0803697105

Global Analysis of Short RNAs Reveals Widespread Promoter-Proximal Stalling and Arrest of Pol II in Drosophila
journal, December 2009


Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes
journal, August 2005


Occupancy by key transcription factors is a more accurate predictor of enhancer activity than histone modifications or chromatin accessibility
journal, April 2015

  • Dogan, Nergiz; Wu, Weisheng; Morrissey, Christapher S.
  • Epigenetics & Chromatin, Vol. 8, Issue 1
  • DOI: 10.1186/s13072-015-0009-5

Activation of transcription in Drosophila embryos is a gradual process mediated by the nucleocytoplasmic ratio.
journal, May 1996


Genome-wide enhancer prediction from epigenetic signatures using genetic algorithm-optimized support vector machines
journal, February 2012

  • Fernández, Michael; Miranda-Saavedra, Diego
  • Nucleic Acids Research, Vol. 40, Issue 10
  • DOI: 10.1093/nar/gks149

On the comparison of regulatory sequences with multiple resolution Entropic Profiles
journal, March 2016


Dynamic reprogramming of chromatin accessibility during Drosophila embryo development
journal, January 2011


DNA regions bound at low occupancy by transcription factors do not drive patterned reporter gene expression in Drosophila
journal, December 2012

  • Fisher, W. W.; Li, J. J.; Hammonds, A. S.
  • Proceedings of the National Academy of Sciences, Vol. 109, Issue 52
  • DOI: 10.1073/pnas.1209589110

Indian Hedgehog: A Mechanotransduction Mediator in Condylar Cartilage
journal, May 2004


Histone H3K27ac separates active from poised enhancers and predicts developmental state
journal, November 2010

  • Creyghton, M. P.; Cheng, A. W.; Welstead, G. G.
  • Proceedings of the National Academy of Sciences, Vol. 107, Issue 50
  • DOI: 10.1073/pnas.1016071107

EnhancerPred: a predictor for discovering enhancers based on the combination and selection of multiple features
journal, December 2016

  • Jia, Cangzhi; He, Wenying
  • Scientific Reports, Vol. 6, Issue 1
  • DOI: 10.1038/srep38741

The Release 6 reference sequence of the Drosophila melanogaster genome
journal, January 2015

  • Hoskins, Roger A.; Carlson, Joseph W.; Wan, Kenneth H.
  • Genome Research, Vol. 25, Issue 3
  • DOI: 10.1101/gr.185579.114

The torso response element binds GAGA and NTF-1/Elf-1, and regulates tailless by relief of repression.
journal, December 1995

  • Liaw, G. J.; Rudolph, K. M.; Huang, J. D.
  • Genes & Development, Vol. 9, Issue 24
  • DOI: 10.1101/gad.9.24.3163

Genome-wide assessment of sequence-intrinsic enhancer responsiveness at single-base-pair resolution
journal, December 2016

  • Arnold, Cosmas D.; Zabidi, Muhammad A.; Pagani, Michaela
  • Nature Biotechnology, Vol. 35, Issue 2
  • DOI: 10.1038/nbt.3739

Genetics of Drosophila Embryogenesis
journal, December 1985


Integrative annotation of chromatin elements from ENCODE data
journal, December 2012

  • Hoffman, Michael M.; Ernst, Jason; Wilder, Steven P.
  • Nucleic Acids Research, Vol. 41, Issue 2
  • DOI: 10.1093/nar/gks1284

The Human Genome Browser at UCSC
journal, May 2002


Enhancers reside in a unique epigenetic environment during early zebrafish development
journal, July 2016


A random forest guided tour
journal, April 2016


Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources
journal, December 2008

  • Huang, Da Wei; Sherman, Brad T.; Lempicki, Richard A.
  • Nature Protocols, Vol. 4, Issue 1
  • DOI: 10.1038/nprot.2008.211

RFECS: A Random-Forest Based Algorithm for Enhancer Identification from Chromatin State
journal, March 2013


Transcription Factors Bind Thousands of Active and Inactive Regions in the Drosophila Blastoderm
journal, February 2008


Inducible chromatin priming is associated with the establishment of immunological memory in T cells
journal, January 2016

  • Bevington, Sarah L.; Cauchy, Pierre; Piper, Jason
  • The EMBO Journal, Vol. 35, Issue 5
  • DOI: 10.15252/embj.201592534

Sex determination and dosage compensation: lessons from flies and worms
journal, May 1994


BiRen: predicting enhancers with a deep-learning-based model using the DNA sequence alone
journal, February 2017


Epigenome editing by a CRISPR-Cas9-based acetyltransferase activates genes from promoters and enhancers
journal, April 2015

  • Hilton, Isaac B.; D'Ippolito, Anthony M.; Vockley, Christopher M.
  • Nature Biotechnology, Vol. 33, Issue 5
  • DOI: 10.1038/nbt.3199

The origin of pattern and polarity in the Drosophila embryo
journal, January 1992


A core transcriptional network for early mesoderm development in Drosophila melanogaster
journal, February 2007

  • Sandmann, T.; Girardot, C.; Brehme, M.
  • Genes & Development, Vol. 21, Issue 4
  • DOI: 10.1101/gad.1509007

Mutations affecting segment number and polarity in Drosophila
journal, October 1980

  • Nüsslein-Volhard, Christiane; Wieschaus, Eric
  • Nature, Vol. 287, Issue 5785
  • DOI: 10.1038/287795a0

Discriminative prediction of mammalian enhancers from DNA sequence
journal, August 2011


Grainyhead and Zelda compete for binding to the promoters of the earliest-expressed Drosophila genes
journal, September 2010

  • Harrison, Melissa M.; Botchan, Michael R.; Cline, Thomas W.
  • Developmental Biology, Vol. 345, Issue 2
  • DOI: 10.1016/j.ydbio.2010.06.026

Genome-scale functional characterization of Drosophila developmental enhancers in vivo
journal, June 2014

  • Kvon, Evgeny Z.; Kazmar, Tomas; Stampfel, Gerald
  • Nature, Vol. 512, Issue 7512
  • DOI: 10.1038/nature13395

A unique chromatin signature uncovers early developmental enhancers in humans
journal, December 2010

  • Rada-Iglesias, Alvaro; Bajpai, Ruchi; Swigut, Tomek
  • Nature, Vol. 470, Issue 7333
  • DOI: 10.1038/nature09692

BEDTools: a flexible suite of utilities for comparing genomic features
journal, January 2010


High-throughput functional testing of ENCODE segmentation predictions
journal, July 2014

  • Kwasnieski, Jamie C.; Fiore, Christopher; Chaudhari, Hemangi G.
  • Genome Research, Vol. 24, Issue 10
  • DOI: 10.1101/gr.173518.114

Groucho acts as a corepressor for a subset of negative regulators, including Hairy and Engrailed
journal, November 1997

  • Jimenez, G.; Paroush, Z. 'e.; Ish-Horowicz, D.
  • Genes & Development, Vol. 11, Issue 22
  • DOI: 10.1101/gad.11.22.3072

Establishment of regions of genomic activity during the Drosophila maternal to zygotic transition
journal, October 2014

  • Li, Xiao-Yong; Harrison, Melissa M.; Villalta, Jacqueline E.
  • eLife, Vol. 3
  • DOI: 10.7554/eLife.03737

The UCSC Table Browser data retrieval tool
journal, January 2004


Commissureless Regulation of Axon Outgrowth across the Midline Is Independent of Rab Function
journal, May 2013


DEEP: a general computational framework for predicting enhancers
journal, November 2014

  • Kleftogiannis, Dimitrios; Kalnis, Panos; Bajic, Vladimir B.
  • Nucleic Acids Research, Vol. 43, Issue 1
  • DOI: 10.1093/nar/gku1058

FlyBase: genomes by the dozen
journal, January 2007

  • Crosby, M. A.; Goodman, J. L.; Strelets, V. B.
  • Nucleic Acids Research, Vol. 35, Issue Database
  • DOI: 10.1093/nar/gkl827

PEDLA: predicting enhancers with a deep learning-based algorithmic framework
journal, June 2016

  • Liu, Feng; Li, Hao; Ren, Chao
  • Scientific Reports, Vol. 6, Issue 1
  • DOI: 10.1038/srep28517

Integrating Diverse Datasets Improves Developmental Enhancer Prediction
journal, June 2014


REDfly 2.0: an integrated database of cis-regulatory modules and transcription factor binding sites in Drosophila
journal, December 2007

  • Halfon, M. S.; Gallo, S. M.; Bergman, C. M.
  • Nucleic Acids Research, Vol. 36, Issue Database
  • DOI: 10.1093/nar/gkm876

i-cisTarget 2015 update: generalized cis-regulatory enrichment analysis in human, mouse and fly
journal, April 2015

  • Imrichová, Hana; Hulselmans, Gert; Kalender Atak, Zeynep
  • Nucleic Acids Research, Vol. 43, Issue W1
  • DOI: 10.1093/nar/gkv395

Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists
journal, November 2008

  • Huang, Da Wei; Sherman, Brad T.; Lempicki, Richard A.
  • Nucleic Acids Research, Vol. 37, Issue 1
  • DOI: 10.1093/nar/gkn923

REDfly v3.0: toward a comprehensive database of transcriptional regulatory elements in Drosophila
journal, October 2010

  • Gallo, S. M.; Gerrard, D. T.; Miner, D.
  • Nucleic Acids Research, Vol. 39, Issue Database
  • DOI: 10.1093/nar/gkq999

An atlas of active enhancers across human cell types and tissues
journal, March 2014

  • Andersson, Robin; Gebhard, Claudia; Miguel-Escalada, Irene
  • Nature, Vol. 507, Issue 7493
  • DOI: 10.1038/nature12787

REDfly: a Regulatory Element Database for Drosophila
journal, November 2005


A Hidden Markov Model approach to variation among sites in rate of evolution
journal, January 1996


Defining functional DNA elements in the human genome
journal, April 2014

  • Kellis, M.; Wold, B.; Snyder, M. P.
  • Proceedings of the National Academy of Sciences, Vol. 111, Issue 17
  • DOI: 10.1073/pnas.1318948111

The nuclear receptor homologue Ftz-F1 and the homeodomain protein Ftz are mutually dependent cofactors
journal, February 1997

  • Guichet, Antoine; Copeland, John W. R.; Erdélyi, Miklós
  • Nature, Vol. 385, Issue 6616
  • DOI: 10.1038/385548a0

Track data hubs enable visualization of user-defined genome-wide annotations on the UCSC Genome Browser
journal, November 2013


An optimized transgenesis system for Drosophila using germ-line-specific  C31 integrases
journal, February 2007

  • Bischof, J.; Maeda, R. K.; Hediger, M.
  • Proceedings of the National Academy of Sciences, Vol. 104, Issue 9
  • DOI: 10.1073/pnas.0611511104

Comm Sorts Robo to Control Axon Guidance at the Drosophila Midline
journal, August 2002


Discover regulatory DNA elements using chromatin signatures and artificial neural network
journal, May 2010


Large-Scale Turnover of Functional Transcription Factor Binding Sites in Drosophila
journal, January 2006


The role of chromatin accessibility in directing the widespread, overlapping patterns of Drosophila transcription factor binding
journal, January 2011


Determination of gene expression patterns using high-throughput RNA in situ hybridization to whole-mount Drosophila embryos
journal, April 2009

  • Weiszmann, Richard; Hammonds, Ann S.; Celniker, Susan E.
  • Nature Protocols, Vol. 4, Issue 5
  • DOI: 10.1038/nprot.2009.55

Combinatorial binding predicts spatio-temporal cis-regulatory activity
journal, November 2009

  • Zinzen, Robert P.; Girardot, Charles; Gagneur, Julien
  • Nature, Vol. 462, Issue 7269
  • DOI: 10.1038/nature08531

Random Forests
journal, January 2001