Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Poisson hurdle model-based method for clustering microbiome features

Journal Article · · Bioinformatics
Abstract Motivation

High-throughput sequencing technologies have greatly facilitated microbiome research and have generated a large volume of microbiome data with the potential to answer key questions regarding microbiome assembly, structure and function. Cluster analysis aims to group features that behave similarly across treatments, and such grouping helps to highlight the functional relationships among features and may provide biological insights into microbiome networks. However, clustering microbiome data are challenging due to the sparsity and high dimensionality.

Results

We propose a model-based clustering method based on Poisson hurdle models for sparse microbiome count data. We describe an expectation–maximization algorithm and a modified version using simulated annealing to conduct the cluster analysis. Moreover, we provide algorithms for initialization and choosing the number of clusters. Simulation results demonstrate that our proposed methods provide better clustering results than alternative methods under a variety of settings. We also apply the proposed method to a sorghum rhizosphere microbiome dataset that results in interesting biological findings.

Availability and implementation

R package is freely available for download at https://cran.r-project.org/package=PHclust.

Supplementary information

Supplementary data are available at Bioinformatics online.

Research Organization:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
USDOE; USDOE Office of Science (SC), Basic Energy Sciences (BES). Scientific User Facilities (SUF); USDOE Office of Science (SC), Biological and Environmental Research (BER)
Grant/Contract Number:
AC02-05CH11231; SC0014395
OSTI ID:
1908292
Alternate ID(s):
OSTI ID: 1906225
OSTI ID: 1986020
Journal Information:
Bioinformatics, Journal Name: Bioinformatics Journal Issue: 1 Vol. 39; ISSN 1367-4811
Publisher:
Oxford University PressCopyright Statement
Country of Publication:
United Kingdom
Language:
English

References (31)

Simulated annealing book January 1987
Recruitment, loss and coexistence in a guild of territorial coral reef fishes journal January 1979
Comparing partitions journal December 1985
A classification EM algorithm for clustering and two stochastic versions journal October 1992
Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models journal January 2003
It's all relative: analyzing microbiome data as compositions journal May 2016
Initializing the EM algorithm in Gaussian mixture models with an unknown number of components journal June 2012
Sorghum rhizosphere effects reduced soil bacterial diversity by recruiting specific bacterial species under low nitrogen stress journal May 2021
Bacterial colonization factors control specificity and stability of the gut microbiota journal August 2013
Objective Criteria for the Evaluation of Clustering Methods journal December 1971
Model-based clustering and data transformations for gene expression data journal October 2001
Model-based clustering for RNA-seq data journal November 2013
Shrinkage improves estimation of microbial associations under different normalization methods journal December 2020
Multi-View Clustering of Microbiome Samples by Robust Similarity Network Fusion and Spectral Clustering journal March 2017
Estimating the number of clusters in a data set via the gap statistic
  • Tibshirani, Robert; Walther, Guenther; Hastie, Trevor
  • Journal of the Royal Statistical Society: Series B (Statistical Methodology), Vol. 63, Issue 2, p. 411-423 https://doi.org/10.1111/1467-9868.00293
journal May 2001
Beyond the Venn diagram: the hunt for a core microbiome: The hunt for a core microbiome journal October 2011
Complete Genome Sequence of the Nitrogen-Fixing and Rhizosphere-Associated Bacterium Pseudomonas stutzeri Strain DSM4166 journal July 2011
Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments journal February 2010
Normalization and microbial differential abundance strategies depend upon data characteristics journal March 2017
Space-type radiation induces multimodal responses in the mouse gut microbiome and metabolome journal August 2017
Competitive lottery-based assembly of selected clades in the human gut microbiome journal October 2018
Model-Based Clustering, Discriminant Analysis, and Density Estimation journal June 2002
Waste Not, Want Not: Why Rarefying Microbiome Data Is Inadmissible journal April 2014
Strengths and Limitations of 16S rRNA Gene Amplicon Sequencing in Revealing Temporal Microbial Community Dynamics journal April 2014
Assessment and Selection of Competing Models for Zero-Inflated Microbiome Data journal July 2015
Some Statistical Models for Limited Dependent Variables with Application to the Demand for Durable Goods journal September 1971
Microbiome Datasets Are Compositional: And This Is Not Optional journal November 2017
Identification of Nitrogen-Fixing Bradyrhizobium Associated With Roots of Field-Grown Sorghum by Metagenome and Proteome Analyses journal March 2019
Emerging Priorities for Microbiome Research journal February 2020
Microbial Community Field Surveys Reveal Abundant Pseudomonas Population in Sorghum Rhizosphere Composed of Many Closely Related Phylotypes journal March 2021
Sweet Sorghum Genotypes Tolerant and Sensitive to Nitrogen Stress Select Distinct Root Endosphere and Rhizosphere Bacterial Communities journal June 2021

Similar Records

Feature selection and causal analysis for microbiome studies in the presence of confounding using standardization
Journal Article · Mon Jul 05 20:00:00 EDT 2021 · BMC Bioinformatics · OSTI ID:1805307

Related Subjects