Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Feature selection and causal analysis for microbiome studies in the presence of confounding using standardization

Journal Article · · BMC Bioinformatics
Abstract Background

Microbiome studies have uncovered associations between microbes and human, animal, and plant health outcomes. This has led to an interest in developing microbial interventions for treatment of disease and optimization of crop yields which requires identification of microbiome features that impact the outcome in the population of interest. That task is challenging because of the high dimensionality of microbiome data and the confounding that results from the complex and dynamic interactions among host, environment, and microbiome. In the presence of such confounding, variable selection and estimation procedures may have unsatisfactory performance in identifying microbial features with an effect on the outcome.

Results

In this manuscript, we aim to estimate population-level effects of individual microbiome features while controlling for confounding by a categorical variable. Due to the high dimensionality and confounding-induced correlation between features, we propose feature screening, selection, and estimation conditional on each stratum of the confounder followed by a standardization approach to estimation of population-level effects of individual features. Comprehensive simulation studies demonstrate the advantages of our approach in recovering relevant features. Utilizing a potential-outcomes framework, we outline assumptions required to ascribe causal, rather than associational, interpretations to the identified microbiome effects. We conducted an agricultural study of the rhizosphere microbiome of sorghum in which nitrogen fertilizer application is a confounding variable. In this study, the proposed approach identified microbial taxa that are consistent with biological understanding of potential plant-microbe interactions.

Conclusions

Standardization enables more accurate identification of individual microbiome features with an effect on the outcome of interest compared to other variable selection and estimation procedures when there is confounding by a categorical variable.

Research Organization:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States); Univ. of Nebraska, Lincoln, NE (United States)
Sponsoring Organization:
USDOE; USDOE Office of Science (SC), Biological and Environmental Research (BER)
Grant/Contract Number:
AC02-05CH11231; SC0014395
OSTI ID:
1805307
Alternate ID(s):
OSTI ID: 1814837
OSTI ID: 1828346
Journal Information:
BMC Bioinformatics, Journal Name: BMC Bioinformatics Journal Issue: 1 Vol. 22; ISSN 1471-2105
Publisher:
Springer Science + Business MediaCopyright Statement
Country of Publication:
United Kingdom
Language:
English

References (62)

Metabolomics of sorghum roots during nitrogen stress reveals compromised metabolic capacity for salicylic acid biosynthesis journal March 2019
Endophytic bacteria in sunflower (Helianthus annuus L.): isolation, characterization, and production of jasmonates and abscisic acid in culture medium journal July 2007
Plant growth promoting bacteria in Brachiaria brizantha journal September 2012
High-dimensional simultaneous inference with the bootstrap journal October 2017
Action of jasmonates in plant stress responses and development — Applied aspects journal January 2014
Suddenly everyone is a microbiota specialist journal July 2016
Assessing the diagnostic importance of nonviable bacterial cells in respiratory infections journal October 2008
Hypothesis testing and statistical analysis of microbiome journal September 2017
Jasmonate signaling in plant interactions with resistance-inducing beneficial microbes journal September 2009
Causality book January 2009
Estimating causal effects of treatments in randomized and nonrandomized studies. journal January 1974
Exact sequence variants should replace operational taxonomic units in marker-gene data analysis journal July 2017
A fair comparison journal March 2014
Keystone taxa as drivers of microbiome structure and functioning journal May 2018
Best practices for analysing microbiomes journal May 2018
An evaluation of the accuracy and speed of metagenome analysis tools journal January 2016
Invited Commentary: Positivity in Practice journal February 2010
Identification of important regressor groups, subgroups and individuals via regularization methods: application to gut microbiome data journal October 2013
A distance-based approach for testing the mediation effect of the human microbiome journal January 2018
Estimating and testing the microbial causal mediation effect with high-dimensional and compositional microbiome data journal July 2019
Tuning parameter selectors for the smoothly clipped absolute deviation method journal August 2007
Variable selection in regression with compositional covariates journal August 2014
Intervening on risk factors for coronary heart disease: an application of the parametric g-formula journal April 2009
High-dimensional Propensity Score Adjustment in Studies of Treatment Effects Using Health Care Claims Data journal January 2009
Model-averaged confounder adjustment for estimating multivariate exposure effects with linear regression: Model-Averaged Confounder Adjustment for Estimating Multivariate Exposure Effects journal March 2018
Model selection and estimation in regression with grouped variables journal February 2006
Sure independence screening for ultrahigh dimensional feature space journal November 2008
Confidence intervals for low dimensional parameters in high dimensional linear models journal July 2013
Assessing and Improving Methods Used in Operational Taxonomic Unit-Based Approaches for 16S rRNA Gene Sequence Analysis journal March 2011
Growing Unculturable Bacteria journal June 2012
Nitrogen fixation and nitrogenase activities in members of the family Rhodospirillaceae journal January 1984
High-Dimensional Statistics with a View Toward Applications in Biology journal January 2014
Metagenomics: Genomic Analysis of Microbial Communities journal December 2004
Comparison of Next-Generation Sequencing Systems journal January 2012
Causal inference with multiple concurrent medications: A comparison of methods and an application in multidrug-resistant tuberculosis journal October 2018
Evaluating the accuracy of amplicon-based microbiome computational pipelines on simulated human gut microbial communities journal May 2017
Normalization and microbial differential abundance strategies depend upon data characteristics journal March 2017
A two-stage microbial association mapping framework with advanced FDR control journal July 2018
Marginal Structural Models to Estimate the Joint Causal Effect of Nonrandomized Treatments journal June 2001
Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties journal December 2001
Causal Inference With General Treatment Regimes: Generalizing the Propensity Score journal September 2004
Nearly unbiased variable selection under minimax concave penalty journal April 2010
Standardization and Control for Confounding in Observational Studies: A Historical Perspective journal November 2014
On asymptotically optimal confidence regions and tests for high-dimensional models journal June 2014
High-Dimensional Inference: Confidence Intervals, $p$-Values and R-Software hdi journal November 2015
Estimating the effect of joint interventions from observational data in sparse high-dimensional settings journal April 2017
Kernel-penalized regression for analysis of microbiome data journal March 2018
Compositional mediation analysis for microbiome studies journal March 2019
False discovery rate control via debiased lasso journal January 2019
under dependency journal August 2001
Estimating the Dimension of a Model journal March 1978
Waste Not, Want Not: Why Rarefying Microbiome Data Is Inadmissible journal April 2014
Randomized clinical trial to evaluate the effect of fecal microbiota transplant for initial Clostridium difficile infection in intestinal microbiome journal December 2017
SIS : An R Package for Sure Independence Screening in Ultrahigh-Dimensional Statistical Models journal January 2018
Investigating Causal Relations by Econometric Models and Cross-spectral Methods journal August 1969
Causal Inference, Path Analysis, and Recursive Structural Equations Models journal January 1988
Microbiome Datasets Are Compositional: And This Is Not Optional journal November 2017
‘TIME’: A Web Application for Obtaining Insights into Microbial Ecology Using Longitudinal Microbiome Data journal January 2018
Predictive Modeling of Microbiome Data Using a Phylogeny-Regularized Generalized Linear Mixed Model journal June 2018
A Phylogeny-Regularized Sparse Regression Model for Predictive Modeling of Microbial Community Data journal December 2018
An introduction to the analysis of shotgun metagenomic data journal June 2014
De novo clustering methods outperform reference-based methods for assigning 16S rRNA gene sequences to operational taxonomic units journal January 2015