skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Iterative random forests to discover predictive and stable high-order interactions

Journal Article · · Proceedings of the National Academy of Sciences of the United States of America
 [1];  [2];  [3];  [4]
  1. Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY 14853,, Department of Statistical Science, Cornell University, Ithaca, NY 14853,, Data Driven Decisions Department, Preminon LLC, Antioch, CA 94531,
  2. Statistics Department, University of California, Berkeley, CA 94720,
  3. Data Driven Decisions Department, Preminon LLC, Antioch, CA 94531,, Statistics Department, University of California, Berkeley, CA 94720,, Centre for Computational Biology, School of Biosciences, University of Birmingham, Edgbaston B15 2TT, United Kingdom,, Molecular Ecosystems Biology Department, Biosciences Area, Lawrence Berkeley National Laboratory, Berkeley, CA 94720,
  4. Data Driven Decisions Department, Preminon LLC, Antioch, CA 94531,, Statistics Department, University of California, Berkeley, CA 94720,, Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA 94720

Genomics has revolutionized biology, enabling the interrogation of whole transcriptomes, genome-wide binding sites for proteins, and many other molecular processes. However, individual genomic assays measure elements that interact in vivo as components of larger molecular machines. Understanding how these high-order interactions drive gene expression presents a substantial statistical challenge. Building on random forests (RFs) and random intersection trees (RITs) and through extensive, biologically inspired simulations, we developed the iterative random forest algorithm (iRF). iRF trains a feature-weighted ensemble of decision trees to detect stable, high-order interactions with the same order of computational cost as the RF. We demonstrate the utility of iRF for high-order interaction discovery in two prediction problems: enhancer activity in the earlyDrosophilaembryo and alternative splicing of primary transcripts in human-derived cell lines. InDrosophila, among the 20 pairwise transcription factor interactions iRF identifies as stable (returned in more than half of bootstrap replicates), 80% have been previously reported as physical interactions. Moreover, third-order interactions, e.g., betweenZelda(Zld),Giant(Gt), and Twist(Twi), suggest high-order relationships that are candidates for follow-up experiments. In human-derived cells, iRF rediscovered a central role of H3K36me3 in chromatin-mediated splicing regulation and identified interesting fifth- and sixth-order interactions, indicative of multivalent nucleosomes with specific roles in splicing regulation. By decoupling the order of interactions from the computational cost of identification, iRF opens additional avenues of inquiry into the molecular mechanisms underlying genome biology.

Research Organization:
Preminon, LLC, Antioch, CA (United States)
Sponsoring Organization:
USDOE Office of Science (SC); National Institutes of Health (NIH) National Human Genome Research Institute (NHGRI); US Army Research Office (ARO); US Department of the Navy, Office of Naval Research (ONR); National Science Foundation (NSF); Center for Science of Information; National Library of Medicine
Grant/Contract Number:
DOE DE-AC02-05CH11231; SC0017069; AC02-05CH11231 U01HG007031; W911NF1710005; N00014-16-1-2664; R00 HG006698; DMS-1613002; T32LM012417
OSTI ID:
1417528
Alternate ID(s):
OSTI ID: 1625005
Journal Information:
Proceedings of the National Academy of Sciences of the United States of America, Journal Name: Proceedings of the National Academy of Sciences of the United States of America Vol. 115 Journal Issue: 8; ISSN 0027-8424
Publisher:
Proceedings of the National Academy of SciencesCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 129 works
Citation information provided by
Web of Science

References (43)

Processing the H3K36me3 signature journal March 2009
An integrated encyclopedia of DNA elements in the human genome journal September 2012
Forest Garrote journal January 2009
Deep sequencing of subcellular RNA fractions shows splicing to be predominantly co-transcriptional in the human genome but inefficient for lncRNAs journal September 2012
Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome journal January 2002
Epigenome editing by a CRISPR-Cas9-based acetyltransferase activates genes from promoters and enhancers journal April 2015
Co-ChIP enables genome-wide mapping of histone mark co-occurrence at single-molecule resolution journal July 2016
Extensive Promoter-Centered Chromatin Interactions Provide a Topological Basis for Transcription Regulation journal January 2012
Modeling gene expression using chromatin features in various cellular contexts journal January 2012
Sequence Analysis Using Logic Regression journal January 2001
Mutations affecting segment number and polarity in Drosophila journal October 1980
Computing away the magic? journal August 2013
A balanced iterative random forest for gene selection from microarray data journal August 2013
From gradients to stripes in Drosophila embryogenesis: filling in the gaps journal November 1996
Node harvest journal December 2010
Extensive cross-regulation of post-transcriptional regulatory networks in Drosophila journal August 2015
Differential chromatin marking of introns and expressed exons by H3K36me3 journal February 2009
ChromNet: Learning the human chromatin network from all ENCODE ChIP-seq data journal April 2016
Impacts of the ubiquitous factor Zelda on Bicoid-dependent DNA binding and transcription in Drosophila journal March 2014
Global Quantitative Modeling of Chromatin Factor Interactions journal March 2014
Measuring reproducibility of high-throughput experiments journal September 2011
Dynamic reprogramming of chromatin accessibility during Drosophila embryo development journal January 2011
DNA regions bound at low occupancy by transcription factors do not drive patterned reporter gene expression in Drosophila journal December 2012
Stability journal September 2013
Zelda Binding in the Early Drosophila melanogaster Embryo Marks Regions Subsequently Activated at the Maternal-to-Zygotic Transition journal October 2011
Multifactor-Dimensionality Reduction Reveals High-Order Interactions among Estrogen-Metabolism Genes in Sporadic Breast Cancer journal July 2001
Predictive learning via rule ensembles journal September 2008
Zelda Potentiates Morphogen Activity by Increasing Chromatin Accessibility journal June 2014
A Broad Set of Chromatin Factors Influences Splicing journal September 2016
Bagging predictors journal August 1996
A U1 snRNP–specific assembly pathway reveals the SMN complex as a versatile hub for RNP exchange journal February 2016
ATP-dependent chromatin remodeling during mammalian development journal August 2016
Enriched random forests journal July 2008
Developmental roles of 21 Drosophila transcription factors are determined by quantitative differences in binding to an overlapping set of thousands of genomic regions journal January 2009
eFORGE: A Tool for Identifying Cell Type-Specific Signal in Epigenomic Data journal November 2016
Integrative annotation of chromatin elements from ENCODE data journal December 2012
Sparse kernel canonical correlation analysis for discovery of nonlinear interactions in high-dimensional data journal February 2017
Random Forests journal January 2001
ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia journal September 2012
Bayesian inference of epistatic interactions in case-control studies journal August 2007
CTCF: from insulators to alternative splicing regulation journal February 2012
Transcriptional Enhancers in Animal Development and Evolution journal September 2010
The zinc-finger protein Zelda is a key activator of the early zygotic genome in Drosophila journal October 2008

Similar Records

Lessons from modENCODE
Journal Article · Fri Jun 26 00:00:00 EDT 2015 · Annual Review of Genomics and Human Genetics · OSTI ID:1417528

GENOME ENABLED MODIFICATION OF POPLAR ROOT DEVELOPMENT FOR INCREASED CARBON SEQUESTRATION
Technical Report · Tue Mar 05 00:00:00 EST 2013 · OSTI ID:1417528

Extensive cross-regulation of post-transcriptional regulatory networks in Drosophila
Journal Article · Thu Aug 20 00:00:00 EDT 2015 · Genome Research · OSTI ID:1417528