DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Learning epistatic polygenic phenotypes with Boolean interactions

Journal Article · · PLoS ONE

Detecting epistatic drivers of human phenotypes is a considerable challenge. Traditional approaches use regression to sequentially test multiplicative interaction terms involving pairs of genetic variants. For higher-order interactions and genome-wide large-scale data, this strategy is computationally intractable. Moreover, multiplicative terms used in regression modeling may not capture the form of biological interactions. Building on the Predictability, Computability, Stability (PCS) framework, we introduce the epiTree pipeline to extract higher-order interactions from genomic data using tree-based models. The epiTree pipeline first selects a set of variants derived from tissue-specific estimates of gene expression. Next, it uses iterative random forests (iRF) to search training data for candidate Boolean interactions (pairwise and higher-order). We derive significance tests for interactions, based on a stabilized likelihood ratio test, by simulating Boolean tree-structured null (no epistasis) and alternative (epistasis) distributions on hold-out test data. Finally, our pipeline computes PCS epistasis p-values that probabilisticly quantify improvement in prediction accuracy via bootstrap sampling on the test set. We validate the epiTree pipeline in two case studies using data from the UK Biobank: predicting red hair and multiple sclerosis (MS). In the case of predicting red hair, epiTree recovers known epistatic interactions surrounding MC1R and novel interactions, representing non-linearities not captured by logistic regression models. In the case of predicting MS, a more complex phenotype than red hair, epiTree rankings prioritize novel interactions surrounding HLA-DRB1, a variant previously associated with MS in several populations. Taken together, these results highlight the potential for epiTree rankings to help reduce the design space for follow up experiments.

Research Organization:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
USDOE; US Army Research Office (ARO)
Grant/Contract Number:
AC02-05CH11231
OSTI ID:
2470817
Journal Information:
PLoS ONE, Journal Name: PLoS ONE Journal Issue: 4 Vol. 19; ISSN 1932-6203
Publisher:
Public Library of ScienceCopyright Statement
Country of Publication:
United States
Language:
English

References (51)

How to increase our belief in discovered statistical interactions via large-scale association studies? journal March 2019
Glutamate, T cells and multiple sclerosis journal February 2017
Alternative definitions of epistasis: dependence and interaction journal September 2001
BOOST: A Fast Approach to Detecting Gene-Gene Interactions in Genome-wide Case-Control Studies journal September 2010
Aberrant expression of alternative splicing variants in multiple sclerosis – A systematic review journal July 2019
How accurate are the extremely small -values used in genomic research: An evaluation of numerical libraries journal May 2009
Small RNAs establish gene expression thresholds journal December 2008
A deep transcriptome meta-analysis reveals sex differences in multiple sclerosis journal June 2023
XV.—The Correlation between Relatives on the Supposition of Mendelian Inheritance. journal January 1919
Random Forests journal January 2001
Functional epistasis on a common MHC haplotype associated with multiple sclerosis journal September 2006
Another explanation for apparent epistasis journal October 2014
The Genotype-Tissue Expression (GTEx) project journal May 2013
A gene-based association method for mapping traits using reference transcriptome data journal August 2015
Epistasis — the essential role of gene interactions in the structure and evolution of genetic systems journal November 2008
Detecting gene–gene interactions that underlie human diseases journal June 2009
Genome-wide study of hair colour in UK Biobank explains most of the SNP heritability journal December 2018
The UK Biobank resource with deep phenotyping and genomic data journal October 2018
Quantitative kinetic analysis of the bacteriophage λ genetic network journal February 2005
Threshold effects in gene regulation: When some is not enough journal April 2005
A forest-based approach to identifying gene and gene–gene interactions journal December 2007
Epistasis amongHLA-DRB1, HLA-DQA1,andHLA-DQB1loci determines multiple sclerosis susceptibility journal May 2009
Iterative random forests to discover predictive and stable high-order interactions journal January 2018
Veridical data science journal February 2020
Provable Boolean interaction recovery from tree ensemble obtained via random forests journal May 2022
The ASA Statement on p -Values: Context, Process, and Purpose journal April 2016
Abandon Statistical Significance journal March 2019
Robustness of a gene regulatory circuit journal August 1999
Statistical Modeling of Interlocus Interactions in a Complex Disease: Rejection of the Multiplicative Model of Epistasis in Type 1 Diabetes journal May 2001
Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans journal October 2002
Refining the association of MHC with multiple sclerosis in African Americans journal May 2010
Multiple sclerosis and the major histocompatibility complex journal June 2009
Fast Implementation of DeLong’s Algorithm for Comparing the Areas Under Correlated Receiver Operating Characteristic Curves journal November 2014
The multiple sclerosis‐ and narcolepsy‐associated HLA class II haplotype includes the DRB5*0101 allele journal October 1995
Genome-Wide Association Scan Allowing for Epistasis in Type 2 Diabetes journal December 2010
Pathogenic implications of iron accumulation in multiple sclerosis journal November 2011
Hypoxia, melanocytes and melanoma - survival and tumor development in the permissive microenvironment of the skin journal April 2009
Massive false-positive gene–gene interactions by Rothman’s additive model journal September 2018
Application of Logistic Regression to Case-Control Association Studies Involving Two Causative Loci journal January 2005
A random forest approach to the detection of epistatic interactions in case-control studies journal January 2009
SNPInterForest: A new method for detecting epistatic interactions journal December 2011
A Novel Statistic for Genome-Wide Interaction Analysis journal September 2010
Improved Statistics for Genome-Wide Interaction Analysis journal April 2012
Investigation of the Role of Mitochondrial DNA in Multiple Sclerosis Susceptibility journal August 2008
Multiple Sclerosis Risk Variant HLA-DRB1*1501 Associates with High Expression of DRB1 Gene in Different Human Populations journal January 2012
On the Relationship Between High-Order Linkage Disequilibrium and Epistasis journal August 2018
Imperfect Linkage Disequilibrium Generates Phantom Epistasis (& Perils of Big Data) journal May 2019
Detecting High-Order Epistasis in Nonlinear Genotype-Phenotype Maps journal January 2017
Large-Scale Analyses Provide No Evidence for Gene-Gene Interactions Influencing Type 2 Diabetes Risk journal August 2020
eQTL Epistasis – Challenges and Computational Approaches journal January 2013
A High-Performance Computing Implementation of Iterative Random Forest for the Creation of Predictive Expression Networks journal December 2019