DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Evaluating the performance of random forest and iterative random forest based methods when applied to gene expression data

Abstract

Gene-to-gene networks, such as Gene Regulatory Networks (GRN) and Predictive Expression Networks (PEN) capture relationships between genes and are beneficial for use in downstream biological analyses. There exists multiple network inference tools to produce these gene-to-gene networks from matrices of gene expression data. Random Forest-Leave One Out Prediction (RF-LOOP) is a method that has been shown to be efficient at producing these gene-to-gene networks, frequently known as GEne Network Inference with Ensemble of trees (GENIE3). Random Forest can be replaced in this process by iterative Random Forest (iRF), which performs variable selection and boosting. Here we validate that iterative Random Forest-Leave One Out Prediction (iRF-LOOP) produces higher quality networks than GENIE3 (RF-LOOP). We use both synthetic and empirical networks from the Dialogue for Reverse Engineering Assessment and Methods (DREAM) Challenges by Sage Bionetworks, as well as two additional empirical networks created from Arabidopsis thaliana and Populus trichocarpa expression data.

Authors:
ORCiD logo [1];  [1];  [1];  [2];  [1]; ORCiD logo [2]; ORCiD logo [2];  [2]
  1. University of Tennessee, Knoxville, TN (United States)
  2. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Biological and Environmental Research (BER)
OSTI Identifier:
1876307
Grant/Contract Number:  
AC05-00OR22725
Resource Type:
Accepted Manuscript
Journal Name:
Computational and Structural Biotechnology Journal
Additional Journal Information:
Journal Volume: 20; Journal Issue: na; Journal ID: ISSN 2001-0370
Publisher:
Elsevier
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; random forest; iterative random forest; gene expression networks; network biology

Citation Formats

Walker, Angelica, Cliff, Ashley, Romero, Jonathon C., Shah, Manesh, Jones, Piet C., Gazolla, Joao Gabriel Felipe Machado, Jacobson, Daniel A., and Kainer, David. Evaluating the performance of random forest and iterative random forest based methods when applied to gene expression data. United States: N. p., 2022. Web. doi:10.1016/j.csbj.2022.06.037.
Walker, Angelica, Cliff, Ashley, Romero, Jonathon C., Shah, Manesh, Jones, Piet C., Gazolla, Joao Gabriel Felipe Machado, Jacobson, Daniel A., & Kainer, David. Evaluating the performance of random forest and iterative random forest based methods when applied to gene expression data. United States. https://doi.org/10.1016/j.csbj.2022.06.037
Walker, Angelica, Cliff, Ashley, Romero, Jonathon C., Shah, Manesh, Jones, Piet C., Gazolla, Joao Gabriel Felipe Machado, Jacobson, Daniel A., and Kainer, David. Wed . "Evaluating the performance of random forest and iterative random forest based methods when applied to gene expression data". United States. https://doi.org/10.1016/j.csbj.2022.06.037. https://www.osti.gov/servlets/purl/1876307.
@article{osti_1876307,
title = {Evaluating the performance of random forest and iterative random forest based methods when applied to gene expression data},
author = {Walker, Angelica and Cliff, Ashley and Romero, Jonathon C. and Shah, Manesh and Jones, Piet C. and Gazolla, Joao Gabriel Felipe Machado and Jacobson, Daniel A. and Kainer, David},
abstractNote = {Gene-to-gene networks, such as Gene Regulatory Networks (GRN) and Predictive Expression Networks (PEN) capture relationships between genes and are beneficial for use in downstream biological analyses. There exists multiple network inference tools to produce these gene-to-gene networks from matrices of gene expression data. Random Forest-Leave One Out Prediction (RF-LOOP) is a method that has been shown to be efficient at producing these gene-to-gene networks, frequently known as GEne Network Inference with Ensemble of trees (GENIE3). Random Forest can be replaced in this process by iterative Random Forest (iRF), which performs variable selection and boosting. Here we validate that iterative Random Forest-Leave One Out Prediction (iRF-LOOP) produces higher quality networks than GENIE3 (RF-LOOP). We use both synthetic and empirical networks from the Dialogue for Reverse Engineering Assessment and Methods (DREAM) Challenges by Sage Bionetworks, as well as two additional empirical networks created from Arabidopsis thaliana and Populus trichocarpa expression data.},
doi = {10.1016/j.csbj.2022.06.037},
journal = {Computational and Structural Biotechnology Journal},
number = na,
volume = 20,
place = {United States},
year = {Wed Jun 22 00:00:00 EDT 2022},
month = {Wed Jun 22 00:00:00 EDT 2022}
}

Works referenced in this record:

Inferring Regulatory Networks from Expression Data Using Tree-Based Methods
journal, September 2010


Epigenomic Diversity in a Global Collection of Arabidopsis thaliana Accessions
journal, July 2016


The Genome of Black Cottonwood, Populus trichocarpa (Torr. & Gray)
journal, September 2006


The 1001 Genomes Project for Arabidopsis thaliana
journal, January 2009


New function of Hypoxia-responsive unknown protein in enhanced resistance to biotic stress
journal, December 2020


Global expression analysis of nucleotide binding site-leucine rich repeat-encoding and related genes in Arabidopsis
journal, October 2007

  • Tan, Xiaoping; Meyers, Blake C.; Kozik, Alexander
  • BMC Plant Biology, Vol. 7, Issue 1
  • DOI: 10.1186/1471-2229-7-56

Unified feature association networks through integration of transcriptomic and proteomic data
journal, September 2019


Plant Metabolic Network 15: A resource of genome‐wide metabolism databases for 126 plants and algae
journal, October 2021

  • Hawkins, Charles; Ginzburg, Daniel; Zhao, Kangmei
  • Journal of Integrative Plant Biology, Vol. 63, Issue 11
  • DOI: 10.1111/jipb.13163

STAR: ultrafast universal RNA-seq aligner
journal, October 2012


A Trihelix DNA Binding Protein Counterbalances Hypoxia-Responsive Transcriptional Activation in Arabidopsis
journal, September 2014


Cross-Kingdom Comparison of Transcriptomic Adjustments to Low-Oxygen Stress Highlights Conserved and Plant-Specific Responses
journal, January 2010

  • Mustroph, Angelika; Lee, Seung Cho; Oosumi, Teruko
  • Plant Physiology, Vol. 152, Issue 3
  • DOI: 10.1104/pp.109.151845

Wisdom of crowds for robust gene network inference
journal, July 2012

  • Marbach, Daniel; Costello, James C.; Küffner, Robert
  • Nature Methods, Vol. 9, Issue 8
  • DOI: 10.1038/nmeth.2016

A High-Performance Computing Implementation of Iterative Random Forest for the Creation of Predictive Expression Networks
journal, December 2019


KEGG: Kyoto Encyclopedia of Genes and Genomes
journal, January 2000

  • Kanehisa, Minoru; Goto, Susumu
  • Nucleic Acids Research, Vol. 28, Issue 1, p. 27-30
  • DOI: 10.1093/nar/28.1.27

Gene length corrected trimmed mean of M-values (GeTMM) processing of RNA-seq data performs similarly in intersample analyses while improving intrasample comparisons
journal, June 2018

  • Smid, Marcel; Coebergh van den Braak, Robert R. J.; van de Werken, Harmen J. G.
  • BMC Bioinformatics, Vol. 19, Issue 1
  • DOI: 10.1186/s12859-018-2246-7

WGCNA: an R package for weighted correlation network analysis
journal, December 2008


Iterative random forests to discover predictive and stable high-order interactions
journal, January 2018

  • Basu, Sumanta; Kumbier, Karl; Brown, James B.
  • Proceedings of the National Academy of Sciences, Vol. 115, Issue 8
  • DOI: 10.1073/pnas.1711236115

Transcriptional Dynamics of Two Seed Compartments with Opposing Roles in Arabidopsis Seed Germination    
journal, July 2013

  • Dekkers, Bas J. W.; Pearce, Simon; van Bolderen-Veldkamp, R. P.
  • Plant Physiology, Vol. 163, Issue 1
  • DOI: 10.1104/pp.113.223511

AraCyc: A Biochemical Pathway Database for Arabidopsis
journal, June 2003

  • Mueller, Lukas A.; Zhang, Peifen; Rhee, Seung Y.
  • Plant Physiology, Vol. 132, Issue 2
  • DOI: 10.1104/pp.102.017236

Multiple internal sorting determinants can contribute to the trafficking of cruciferin to protein storage vacuoles
journal, February 2015

  • Hegedus, Dwayne D.; Coutu, Cathy; Harrington, Myrtle
  • Plant Molecular Biology, Vol. 88, Issue 1-2
  • DOI: 10.1007/s11103-015-0297-y

Random Forests for Classification in Ecology
journal, November 2007

  • Cutler, D. Richard; Edwards, Thomas C.; Beard, Karen H.
  • Ecology, Vol. 88, Issue 11
  • DOI: 10.1890/07-0539.1

The gibberellin signaling negative regulator RGA-LIKE3 promotes seed storage protein accumulation
journal, January 2021


Cumulated gain-based evaluation of IR techniques
journal, October 2002

  • Järvelin, Kalervo; Kekäläinen, Jaana
  • ACM Transactions on Information Systems, Vol. 20, Issue 4
  • DOI: 10.1145/582415.582418

SCENIC: single-cell regulatory network inference and clustering
journal, October 2017

  • Aibar, Sara; González-Blas, Carmen Bravo; Moerman, Thomas
  • Nature Methods, Vol. 14, Issue 11
  • DOI: 10.1038/nmeth.4463

Random walk with restart: fast solutions and applications
journal, July 2007

  • Tong, Hanghang; Faloutsos, Christos; Pan, Jia-Yu
  • Knowledge and Information Systems, Vol. 14, Issue 3
  • DOI: 10.1007/s10115-007-0094-2

An Arabidopsis Transcriptional Regulatory Map Reveals Distinct Functional and Evolutionary Features of Novel Transcription Factors
journal, March 2015

  • Jin, Jinpu; He, Kun; Tang, Xing
  • Molecular Biology and Evolution, Vol. 32, Issue 7
  • DOI: 10.1093/molbev/msv058

Random Forests
journal, January 2001


Genetic reconstruction of a functional transcriptional regulatory network
journal, April 2007

  • Hu, Zhanzhi; Killion, Patrick J.; Iyer, Vishwanath R.
  • Nature Genetics, Vol. 39, Issue 5
  • DOI: 10.1038/ng2012