Evaluating the performance of random forest and iterative random forest based methods when applied to gene expression data
Abstract
Gene-to-gene networks, such as Gene Regulatory Networks (GRN) and Predictive Expression Networks (PEN) capture relationships between genes and are beneficial for use in downstream biological analyses. There exists multiple network inference tools to produce these gene-to-gene networks from matrices of gene expression data. Random Forest-Leave One Out Prediction (RF-LOOP) is a method that has been shown to be efficient at producing these gene-to-gene networks, frequently known as GEne Network Inference with Ensemble of trees (GENIE3). Random Forest can be replaced in this process by iterative Random Forest (iRF), which performs variable selection and boosting. Here we validate that iterative Random Forest-Leave One Out Prediction (iRF-LOOP) produces higher quality networks than GENIE3 (RF-LOOP). We use both synthetic and empirical networks from the Dialogue for Reverse Engineering Assessment and Methods (DREAM) Challenges by Sage Bionetworks, as well as two additional empirical networks created from Arabidopsis thaliana and Populus trichocarpa expression data.
- Authors:
-
- University of Tennessee, Knoxville, TN (United States)
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
- Publication Date:
- Research Org.:
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
- Sponsoring Org.:
- USDOE Office of Science (SC), Biological and Environmental Research (BER)
- OSTI Identifier:
- 1876307
- Grant/Contract Number:
- AC05-00OR22725
- Resource Type:
- Accepted Manuscript
- Journal Name:
- Computational and Structural Biotechnology Journal
- Additional Journal Information:
- Journal Volume: 20; Journal Issue: na; Journal ID: ISSN 2001-0370
- Publisher:
- Elsevier
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 59 BASIC BIOLOGICAL SCIENCES; random forest; iterative random forest; gene expression networks; network biology
Citation Formats
Walker, Angelica, Cliff, Ashley, Romero, Jonathon C., Shah, Manesh, Jones, Piet C., Gazolla, Joao Gabriel Felipe Machado, Jacobson, Daniel A., and Kainer, David. Evaluating the performance of random forest and iterative random forest based methods when applied to gene expression data. United States: N. p., 2022.
Web. doi:10.1016/j.csbj.2022.06.037.
Walker, Angelica, Cliff, Ashley, Romero, Jonathon C., Shah, Manesh, Jones, Piet C., Gazolla, Joao Gabriel Felipe Machado, Jacobson, Daniel A., & Kainer, David. Evaluating the performance of random forest and iterative random forest based methods when applied to gene expression data. United States. https://doi.org/10.1016/j.csbj.2022.06.037
Walker, Angelica, Cliff, Ashley, Romero, Jonathon C., Shah, Manesh, Jones, Piet C., Gazolla, Joao Gabriel Felipe Machado, Jacobson, Daniel A., and Kainer, David. Wed .
"Evaluating the performance of random forest and iterative random forest based methods when applied to gene expression data". United States. https://doi.org/10.1016/j.csbj.2022.06.037. https://www.osti.gov/servlets/purl/1876307.
@article{osti_1876307,
title = {Evaluating the performance of random forest and iterative random forest based methods when applied to gene expression data},
author = {Walker, Angelica and Cliff, Ashley and Romero, Jonathon C. and Shah, Manesh and Jones, Piet C. and Gazolla, Joao Gabriel Felipe Machado and Jacobson, Daniel A. and Kainer, David},
abstractNote = {Gene-to-gene networks, such as Gene Regulatory Networks (GRN) and Predictive Expression Networks (PEN) capture relationships between genes and are beneficial for use in downstream biological analyses. There exists multiple network inference tools to produce these gene-to-gene networks from matrices of gene expression data. Random Forest-Leave One Out Prediction (RF-LOOP) is a method that has been shown to be efficient at producing these gene-to-gene networks, frequently known as GEne Network Inference with Ensemble of trees (GENIE3). Random Forest can be replaced in this process by iterative Random Forest (iRF), which performs variable selection and boosting. Here we validate that iterative Random Forest-Leave One Out Prediction (iRF-LOOP) produces higher quality networks than GENIE3 (RF-LOOP). We use both synthetic and empirical networks from the Dialogue for Reverse Engineering Assessment and Methods (DREAM) Challenges by Sage Bionetworks, as well as two additional empirical networks created from Arabidopsis thaliana and Populus trichocarpa expression data.},
doi = {10.1016/j.csbj.2022.06.037},
journal = {Computational and Structural Biotechnology Journal},
number = na,
volume = 20,
place = {United States},
year = {Wed Jun 22 00:00:00 EDT 2022},
month = {Wed Jun 22 00:00:00 EDT 2022}
}
Works referenced in this record:
Inferring Regulatory Networks from Expression Data Using Tree-Based Methods
journal, September 2010
- Huynh-Thu, Vân Anh; Irrthum, Alexandre; Wehenkel, Louis
- PLoS ONE, Vol. 5, Issue 9
Epigenomic Diversity in a Global Collection of Arabidopsis thaliana Accessions
journal, July 2016
- Kawakatsu, Taiji; Huang, Shao-shan Carol; Jupe, Florian
- Cell, Vol. 166, Issue 2
The Genome of Black Cottonwood, Populus trichocarpa (Torr. & Gray)
journal, September 2006
- Tuskan, G. A.; DiFazio, S.; Jansson, S.
- Science, Vol. 313, Issue 5793, p. 1596-1604
The 1001 Genomes Project for Arabidopsis thaliana
journal, January 2009
- Weigel, Detlef; Mott, Richard
- Genome Biology, Vol. 10, Issue 5
New function of Hypoxia-responsive unknown protein in enhanced resistance to biotic stress
journal, December 2020
- Huh, Sung Un
- Plant Signaling & Behavior, Vol. 16, Issue 3
Global expression analysis of nucleotide binding site-leucine rich repeat-encoding and related genes in Arabidopsis
journal, October 2007
- Tan, Xiaoping; Meyers, Blake C.; Kozik, Alexander
- BMC Plant Biology, Vol. 7, Issue 1
Unified feature association networks through integration of transcriptomic and proteomic data
journal, September 2019
- McClure, Ryan S.; Wendler, Jason P.; Adkins, Joshua N.
- PLOS Computational Biology, Vol. 15, Issue 9
Plant Metabolic Network 15: A resource of genome‐wide metabolism databases for 126 plants and algae
journal, October 2021
- Hawkins, Charles; Ginzburg, Daniel; Zhao, Kangmei
- Journal of Integrative Plant Biology, Vol. 63, Issue 11
STAR: ultrafast universal RNA-seq aligner
journal, October 2012
- Dobin, Alexander; Davis, Carrie A.; Schlesinger, Felix
- Bioinformatics, Vol. 29, Issue 1
A Trihelix DNA Binding Protein Counterbalances Hypoxia-Responsive Transcriptional Activation in Arabidopsis
journal, September 2014
- Giuntoli, Beatrice; Lee, Seung Cho; Licausi, Francesco
- PLoS Biology, Vol. 12, Issue 9
Cross-Kingdom Comparison of Transcriptomic Adjustments to Low-Oxygen Stress Highlights Conserved and Plant-Specific Responses
journal, January 2010
- Mustroph, Angelika; Lee, Seung Cho; Oosumi, Teruko
- Plant Physiology, Vol. 152, Issue 3
Wisdom of crowds for robust gene network inference
journal, July 2012
- Marbach, Daniel; Costello, James C.; Küffner, Robert
- Nature Methods, Vol. 9, Issue 8
A High-Performance Computing Implementation of Iterative Random Forest for the Creation of Predictive Expression Networks
journal, December 2019
- Cliff, Ashley; Romero, Jonathon; Kainer, David
- Genes, Vol. 10, Issue 12
KEGG: Kyoto Encyclopedia of Genes and Genomes
journal, January 2000
- Kanehisa, Minoru; Goto, Susumu
- Nucleic Acids Research, Vol. 28, Issue 1, p. 27-30
Gene length corrected trimmed mean of M-values (GeTMM) processing of RNA-seq data performs similarly in intersample analyses while improving intrasample comparisons
journal, June 2018
- Smid, Marcel; Coebergh van den Braak, Robert R. J.; van de Werken, Harmen J. G.
- BMC Bioinformatics, Vol. 19, Issue 1
WGCNA: an R package for weighted correlation network analysis
journal, December 2008
- Langfelder, Peter; Horvath, Steve
- BMC Bioinformatics, Vol. 9, Issue 1
Iterative random forests to discover predictive and stable high-order interactions
journal, January 2018
- Basu, Sumanta; Kumbier, Karl; Brown, James B.
- Proceedings of the National Academy of Sciences, Vol. 115, Issue 8
Transcriptional Dynamics of Two Seed Compartments with Opposing Roles in Arabidopsis Seed Germination
journal, July 2013
- Dekkers, Bas J. W.; Pearce, Simon; van Bolderen-Veldkamp, R. P.
- Plant Physiology, Vol. 163, Issue 1
Recursive random forest algorithm for constructing multilayered hierarchical gene regulatory networks that govern biological pathways
journal, February 2017
- Deng, Wenping; Zhang, Kui; Busov, Victor
- PLOS ONE, Vol. 12, Issue 2
AraCyc: A Biochemical Pathway Database for Arabidopsis
journal, June 2003
- Mueller, Lukas A.; Zhang, Peifen; Rhee, Seung Y.
- Plant Physiology, Vol. 132, Issue 2
Multiple internal sorting determinants can contribute to the trafficking of cruciferin to protein storage vacuoles
journal, February 2015
- Hegedus, Dwayne D.; Coutu, Cathy; Harrington, Myrtle
- Plant Molecular Biology, Vol. 88, Issue 1-2
Random Forests for Classification in Ecology
journal, November 2007
- Cutler, D. Richard; Edwards, Thomas C.; Beard, Karen H.
- Ecology, Vol. 88, Issue 11
The gibberellin signaling negative regulator RGA-LIKE3 promotes seed storage protein accumulation
journal, January 2021
- Hu, Yilong; Zhou, Limeng; Yang, Yuhua
- Plant Physiology, Vol. 185, Issue 4
Cumulated gain-based evaluation of IR techniques
journal, October 2002
- Järvelin, Kalervo; Kekäläinen, Jaana
- ACM Transactions on Information Systems, Vol. 20, Issue 4
SCENIC: single-cell regulatory network inference and clustering
journal, October 2017
- Aibar, Sara; González-Blas, Carmen Bravo; Moerman, Thomas
- Nature Methods, Vol. 14, Issue 11
Random walk with restart: fast solutions and applications
journal, July 2007
- Tong, Hanghang; Faloutsos, Christos; Pan, Jia-Yu
- Knowledge and Information Systems, Vol. 14, Issue 3
An Arabidopsis Transcriptional Regulatory Map Reveals Distinct Functional and Evolutionary Features of Novel Transcription Factors
journal, March 2015
- Jin, Jinpu; He, Kun; Tang, Xing
- Molecular Biology and Evolution, Vol. 32, Issue 7
Genetic reconstruction of a functional transcriptional regulatory network
journal, April 2007
- Hu, Zhanzhi; Killion, Patrick J.; Iyer, Vishwanath R.
- Nature Genetics, Vol. 39, Issue 5
Self-rescue of an EXTENSIN mutant reveals alternative gene expression programs and candidate proteins for new cell wall assembly in Arabidopsis
journal, May 2013
- Saha, Prasenjit; Ray, Tui; Tang, Yuhong
- The Plant Journal, Vol. 75, Issue 1