Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Evaluating the performance of random forest and iterative random forest based methods when applied to gene expression data

Journal Article · · Computational and Structural Biotechnology Journal

Gene-to-gene networks, such as Gene Regulatory Networks (GRN) and Predictive Expression Networks (PEN) capture relationships between genes and are beneficial for use in downstream biological analyses. There exists multiple network inference tools to produce these gene-to-gene networks from matrices of gene expression data. Random Forest-Leave One Out Prediction (RF-LOOP) is a method that has been shown to be efficient at producing these gene-to-gene networks, frequently known as GEne Network Inference with Ensemble of trees (GENIE3). Random Forest can be replaced in this process by iterative Random Forest (iRF), which performs variable selection and boosting. Here we validate that iterative Random Forest-Leave One Out Prediction (iRF-LOOP) produces higher quality networks than GENIE3 (RF-LOOP). We use both synthetic and empirical networks from the Dialogue for Reverse Engineering Assessment and Methods (DREAM) Challenges by Sage Bionetworks, as well as two additional empirical networks created from Arabidopsis thaliana and Populus trichocarpa expression data.

Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE; USDOE Office of Science (SC), Biological and Environmental Research (BER)
Grant/Contract Number:
AC05-00OR22725
OSTI ID:
1876307
Journal Information:
Computational and Structural Biotechnology Journal, Journal Name: Computational and Structural Biotechnology Journal Journal Issue: na Vol. 20; ISSN 2001-0370
Publisher:
ElsevierCopyright Statement
Country of Publication:
United States
Language:
English

References (30)

Random walk with restart: fast solutions and applications journal July 2007
Multiple internal sorting determinants can contribute to the trafficking of cruciferin to protein storage vacuoles journal February 2015
Epigenomic Diversity in a Global Collection of Arabidopsis thaliana Accessions journal July 2016
Random Forests journal January 2001
Genetic reconstruction of a functional transcriptional regulatory network journal April 2007
Wisdom of crowds for robust gene network inference journal July 2012
SCENIC: single-cell regulatory network inference and clustering journal October 2017
Iterative random forests to discover predictive and stable high-order interactions journal January 2018
New function of Hypoxia-responsive unknown protein in enhanced resistance to biotic stress journal December 2020
STAR: ultrafast universal RNA-seq aligner journal October 2012
An Arabidopsis Transcriptional Regulatory Map Reveals Distinct Functional and Evolutionary Features of Novel Transcription Factors journal March 2015
KEGG: Kyoto Encyclopedia of Genes and Genomes journal January 2000
The gibberellin signaling negative regulator RGA-LIKE3 promotes seed storage protein accumulation journal January 2021
AraCyc: A Biochemical Pathway Database for Arabidopsis journal June 2003
Cross-Kingdom Comparison of Transcriptomic Adjustments to Low-Oxygen Stress Highlights Conserved and Plant-Specific Responses journal January 2010
Transcriptional Dynamics of Two Seed Compartments with Opposing Roles in Arabidopsis Seed Germination     journal July 2013
Plant Metabolic Network 15: A resource of genome‐wide metabolism databases for 126 plants and algae journal October 2021
Self-rescue of an EXTENSIN mutant reveals alternative gene expression programs and candidate proteins for new cell wall assembly in Arabidopsis journal May 2013
The Genome of Black Cottonwood, Populus trichocarpa (Torr. & Gray) journal September 2006
Cumulated gain-based evaluation of IR techniques journal October 2002
WGCNA: an R package for weighted correlation network analysis journal December 2008
Global expression analysis of nucleotide binding site-leucine rich repeat-encoding and related genes in Arabidopsis journal October 2007
The 1001 Genomes Project for Arabidopsis thaliana journal January 2009
Gene length corrected trimmed mean of M-values (GeTMM) processing of RNA-seq data performs similarly in intersample analyses while improving intrasample comparisons journal June 2018
A Trihelix DNA Binding Protein Counterbalances Hypoxia-Responsive Transcriptional Activation in Arabidopsis journal September 2014
Unified feature association networks through integration of transcriptomic and proteomic data journal September 2019
Inferring Regulatory Networks from Expression Data Using Tree-Based Methods journal September 2010
Recursive random forest algorithm for constructing multilayered hierarchical gene regulatory networks that govern biological pathways journal February 2017
Random Forests for Classification in Ecology journal November 2007
A High-Performance Computing Implementation of Iterative Random Forest for the Creation of Predictive Expression Networks journal December 2019