skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Nonpareil 3: Fast Estimation of Metagenomic Coverage and Sequence Diversity

Journal Article · · mSystems
ORCiD logo [1];  [2];  [3];  [4];  [5];  [6]
  1. Georgia Inst. of Technology, Atlanta, GA (United States). School of Civil and Environmental Engineering
  2. Michigan State Univ., East Lansing, MI (United States). Center for Microbial Ecology
  3. Michigan State Univ., East Lansing, MI (United States). Center for Microbial Ecology, Dept. of Microbiology and Molecular Genetics, and Dept. of Plant, Soil and Microbial Sciences
  4. Michigan State Univ., East Lansing, MI (United States). Center for Microbial Ecology and Dept. of Plant, Soil and Microbial Sciences
  5. Georgia Inst. of Technology, Atlanta, GA (United States). School of Civil and Environmental Engineering and School of Biological Sciences
  6. Univ. of North Carolina, Charlotte, NC (United States)

Estimations of microbial community diversity based on metagenomic data sets are affected, often to an unknown degree, by biases derived from insufficient coverage and reference database-dependent estimations of diversity. For instance, the completeness of reference databases cannot be generally estimated since it depends on the extant diversity sampled to date, which, with the exception of a few habitats such as the human gut, remains severely undersampled. Further, estimation of the degree of coverage of a microbial community by a metagenomic data set is prohibitively time-consuming for large data sets, and coverage values may not be directly comparable between data sets obtained with different sequencing technologies. Here, we extend Nonpareil, a database-independent tool for the estimation of coverage in metagenomic data sets, to a high-performance computing implementation that scales up to hundreds of cores and includes, in addition, a k-mer-based estimation as sensitive as the original alignment-based version but about three hundred times as fast. Further, we propose a metric of sequence diversity (Nd) derived directly from Nonpareil curves that correlates well with alpha diversity assessed by traditional metrics. We use this metric in different experiments demonstrating the correlation with the Shannon index estimated on 16S rRNA gene profiles and show that Nd additionally reveals seasonal patterns in marine samples that are not captured by the Shannon index and more precise rankings of the magnitude of diversity of microbial communities in different habitats. Therefore, the new version of Nonpareil, called Nonpareil 3, advances the toolbox for metagenomic analyses of microbiomes.

Research Organization:
Univ. of Wisconsin, Madison, WI (United States); Univ. of Tennessee, Knoxville, TN (United States)
Sponsoring Organization:
USDOE Office of Science (SC)
Grant/Contract Number:
FC02-07ER64494; SC0006662
OSTI ID:
1511043
Journal Information:
mSystems, Vol. 3, Issue 3; ISSN 2379-5077
Publisher:
American Society for MicrobiologyCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 104 works
Citation information provided by
Web of Science

References (22)

A General Coverage Theory for Shotgun DNA Sequencing journal July 2006
Coverage theories for metagenomic DNA sequencing based on a generalization of Stevens’ theorem journal September 2012
Occupancy Modeling, Maximum Contig Size Probabilities and Designing Metagenomics Experiments journal July 2010
Nonpareil: a redundancy-based approach to assess the level of coverage in metagenomic datasets journal October 2013
SolexaQA: At-a-glance quality assessment of Illumina second-generation sequencing data journal September 2010
Theories and Applications for Sequencing Randomly Selected Clones journal January 2001
Genomic mapping by fingerprinting random clones: A mathematical analysis journal April 1988
The Variability of the 16S rRNA Gene in Bacterial Genomes and Its Consequences for Bacterial Community Analyses journal February 2013
COVER: a priori estimation of coverage for metagenomic sequencing: Estimation of sequencing effort for metagenomics journal April 2012
MetLab: An In Silico Experimental Design, Simulation and Analysis Tool for Viral Metagenomics Studies journal August 2016
EBI metagenomics in 2016 - an expanding and evolving resource for the analysis and archiving of metagenomic data journal November 2015
Estimating DNA coverage and abundance in metagenomes using a gamma approximation journal December 2009
Structure and function of the global ocean microbiome journal May 2015
The Population Frequencies of Species and the Estimation of Population Parameters journal January 1953
The role of community and population ecology in applying mycorrhizal fungi for improved food security journal October 2014
Predicting the molecular complexity of sequencing libraries journal February 2013
Metagenomic Insights into the Evolution, Function, and Complexity of the Planktonic Microbial Community of Lake Lanier, a Temperate Freshwater Ecosystem journal July 2011
QIIME allows analysis of high-throughput community sequencing data journal April 2010
The Efficiency of Good's Nonparametric Coverage Estimator journal September 1986
The Population Frequencies of Species and the Estimation of Population Parameters journal December 1953
Theories and Applications for Sequencing Randomly Selected Clones journal January 2001
Infant gut microbiota characteristics generally do not modify effects of lipid-based nutrient supplementation on growth or inflammation: secondary analysis of a randomized controlled trial in Malawi journal September 2020

Cited By (11)

Anaerobic degradation of hexadecane and phenanthrene coupled to sulfate reduction by enriched consortia from northern Gulf of Mexico seafloor sediment journal February 2019
A metagenomic survey of soil microbial communities along a rehabilitation chronosequence after iron ore mining journal February 2019
Responses of tundra soil microbial communities to half a decade of experimental warming at two critical depths journal July 2019
Prevalence of viral photosynthesis genes along a freshwater to saltwater transect in Southeast USA journal July 2019
Metagenomic Signatures of Gut Infections Caused by Different Escherichia coli Pathotypes journal October 2019
The preceding root system drives the composition and function of the rhizosphere microbiome journal April 2020
Evaluation of Sequencing Library Preparation Protocols for Viral Metagenomic Analysis from Pristine Aquifer Groundwaters journal May 2019
Disinfection exhibits systematic impacts on the drinking water microbiome journal March 2020
Parasite microbiome project: Grand challenges journal October 2019
Review, Evaluation, and Directions for Gene-Targeted Assembly for Ecological Analyses of Metagenomes journal October 2019
Microbial Community Structure and Functional Potential in Cultivated and Native Tallgrass Prairie Soils of the Midwestern United States journal August 2018

Similar Records

Estimating DNA coverage and abundance in metagenomes using a gamma approximation
Journal Article · Fri Jan 01 00:00:00 EST 2010 · Bioinformatics Online · OSTI ID:1511043

Metagenomic Insights Into the Microbial Iron Cycle of Subseafloor Habitats
Journal Article · Fri Sep 03 00:00:00 EDT 2021 · Frontiers in Microbiology · OSTI ID:1511043

Eukaryotic genomes from a global metagenomic data set illuminate trophic modes and biogeography of ocean plankton
Journal Article · Tue Dec 19 00:00:00 EST 2023 · mBio (Online) · OSTI ID:1511043