skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: How Much Do rRNA Gene Surveys Underestimate Extant Bacterial Diversity?

Abstract

The most common practice in studying and cataloguing prokaryotic diversity involves the grouping of sequences into operational taxonomic units (OTUs) at the 97% 16S rRNA gene sequence identity level, often using partial gene sequences, such as PCR-generated amplicons. Due to the high sequence conservation of rRNA genes, organisms belonging to closely related yet distinct species may be grouped under the same OTU. With this being said, it remains unclear how much diversity has been underestimated by this practice. To address this question, we compared the OTUs of genomes defined at the 97% or 98.5% 16S rRNA gene identity level against OTUs of the same genomes defined at the 95% whole-genome average nucleotide identity (ANI), which is a much more accurate proxy for species. Our results show that OTUs resulting from a 98.5% 16S rRNA gene identity cutoff are more accurate than 97% compared to 95% ANI (90.5% versus 89.9% accuracy) but indistinguishable from any other threshold in the 98.29 to 98.78% range. Even with the more stringent thresholds, the 16S rRNA gene-based approach commonly underestimates the number of OTUs by ~12%, on average, compared to the ANI-based approach (~14% underestimation when using the 97% identity threshold). Moreover, the degree ofmore » underestimation can become 50% or more for certain taxa, such as the genera Pseudomonas, Burkholderia, Escherichia, Campylobacter, and Citrobacter. These results provide a quantitative view of the degree of underestimation of extant prokaryotic diversity by 16S rRNA gene-defined OTUs and suggest that genomic resolution is often necessary. IMPORTANCE: Species diversity is one of the most fundamental pieces of information for community ecology and conservational biology. Thus, employing accurate proxies for what a species or the unit of diversity is are cornerstones for a large set of microbial ecology and diversity studies. The most common proxies currently used rely on the clustering of 16S rRNA gene sequences at some threshold of nucleotide identity, typically 97% or 98.5%. Here, we explore how well this strategy reflects the more accurate whole-genome-based proxies and determine the frequency with which the high conservation of 16S rRNA sequences masks substantial species-level diversity.« less

Authors:
ORCiD logo [1];  [1];  [2];  [3];  [3];  [1];  [4]
  1. Georgia Inst. of Technology, Atlanta, GA (United States)
  2. USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States)
  3. Michigan State Univ., East Lansing, MI (United States)
  4. Univ. of Tennessee, Knoxville, TN (United States); Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Publication Date:
Research Org.:
Michigan State Univ., East Lansing, MI (United States); Univ. of California, Berkeley, CA (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Basic Energy Sciences (BES) (SC-22). Scientific User Facilities Division
OSTI Identifier:
1503621
Grant/Contract Number:  
FG02-99ER62848
Resource Type:
Journal Article: Accepted Manuscript
Journal Name:
Applied and Environmental Microbiology
Additional Journal Information:
Journal Volume: 84; Journal Issue: 6; Journal ID: ISSN 0099-2240
Publisher:
American Society for Microbiology
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; average nucleotide identity; diversity; 16S rRNA gene

Citation Formats

Rodriguez-R, Luis M., Castro, Juan C., Kyrpides, Nikos C., Cole, James R., Tiedje, James M., Konstantinidis, Konstantinos T., and Löffler, Frank E. How Much Do rRNA Gene Surveys Underestimate Extant Bacterial Diversity?. United States: N. p., 2018. Web. doi:10.1128/aem.00014-18.
Rodriguez-R, Luis M., Castro, Juan C., Kyrpides, Nikos C., Cole, James R., Tiedje, James M., Konstantinidis, Konstantinos T., & Löffler, Frank E. How Much Do rRNA Gene Surveys Underestimate Extant Bacterial Diversity?. United States. doi:10.1128/aem.00014-18.
Rodriguez-R, Luis M., Castro, Juan C., Kyrpides, Nikos C., Cole, James R., Tiedje, James M., Konstantinidis, Konstantinos T., and Löffler, Frank E. Fri . "How Much Do rRNA Gene Surveys Underestimate Extant Bacterial Diversity?". United States. doi:10.1128/aem.00014-18. https://www.osti.gov/servlets/purl/1503621.
@article{osti_1503621,
title = {How Much Do rRNA Gene Surveys Underestimate Extant Bacterial Diversity?},
author = {Rodriguez-R, Luis M. and Castro, Juan C. and Kyrpides, Nikos C. and Cole, James R. and Tiedje, James M. and Konstantinidis, Konstantinos T. and Löffler, Frank E.},
abstractNote = {The most common practice in studying and cataloguing prokaryotic diversity involves the grouping of sequences into operational taxonomic units (OTUs) at the 97% 16S rRNA gene sequence identity level, often using partial gene sequences, such as PCR-generated amplicons. Due to the high sequence conservation of rRNA genes, organisms belonging to closely related yet distinct species may be grouped under the same OTU. With this being said, it remains unclear how much diversity has been underestimated by this practice. To address this question, we compared the OTUs of genomes defined at the 97% or 98.5% 16S rRNA gene identity level against OTUs of the same genomes defined at the 95% whole-genome average nucleotide identity (ANI), which is a much more accurate proxy for species. Our results show that OTUs resulting from a 98.5% 16S rRNA gene identity cutoff are more accurate than 97% compared to 95% ANI (90.5% versus 89.9% accuracy) but indistinguishable from any other threshold in the 98.29 to 98.78% range. Even with the more stringent thresholds, the 16S rRNA gene-based approach commonly underestimates the number of OTUs by ~12%, on average, compared to the ANI-based approach (~14% underestimation when using the 97% identity threshold). Moreover, the degree of underestimation can become 50% or more for certain taxa, such as the genera Pseudomonas, Burkholderia, Escherichia, Campylobacter, and Citrobacter. These results provide a quantitative view of the degree of underestimation of extant prokaryotic diversity by 16S rRNA gene-defined OTUs and suggest that genomic resolution is often necessary. IMPORTANCE: Species diversity is one of the most fundamental pieces of information for community ecology and conservational biology. Thus, employing accurate proxies for what a species or the unit of diversity is are cornerstones for a large set of microbial ecology and diversity studies. The most common proxies currently used rely on the clustering of 16S rRNA gene sequences at some threshold of nucleotide identity, typically 97% or 98.5%. Here, we explore how well this strategy reflects the more accurate whole-genome-based proxies and determine the frequency with which the high conservation of 16S rRNA sequences masks substantial species-level diversity.},
doi = {10.1128/aem.00014-18},
journal = {Applied and Environmental Microbiology},
issn = {0099-2240},
number = 6,
volume = 84,
place = {United States},
year = {2018},
month = {1}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 5 works
Citation information provided by
Web of Science

Figures / Tables:

FIG 1 FIG 1: Most accurate 16S rRNA gene identity thresholds with respect to 95% ANI. The figure shows the F1 score (top) and accuracy (bottom) of different 16S rRNA gene identity thresholds (x axis) using 95% ANI as a reference. Both metrics represent trade-offs between recall and precision. For each metric,more » the plot displays the summary statistics of 1,000 rounds of bootstrap on the NCBI-Prok collection as bands; mean (solid line), 80% power range (β20%, darker band), interquartile range (IQR; intermediate band), and 95% confidence interval (CI95%; lightest band). In the lower portion of each panel (horizontal shading), the identity thresholds with the highest F1 score or accuracy are marked with vertical solid black lines (98.32% and 98.64% for F1, 98.64% for accuracy). The regions in which the mean F1 score or accuracy is within the β20%, IQR, and 95% CI ranges of the thresholds with highest values are indicated with concentric gray bands. The 16S rRNA gene identity threshold used in this study (98.5%) is indicated with a filled black arrowhead, the default 16S rRNA gene identity threshold in QIIME and mothur (97%) is indicated with a filled gray arrowhead, and other less common thresholds used in the literature (98.65% and 98.7%) are indicated with open black arrowheads. All except 97% are within the β20% range of the highest F1.« less

Save / Share:

Works referenced in this record:

Then and now: a systematic review of the systematics of prokaryotes in the last 80 years
journal, December 2013


NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins
journal, December 2004

  • Pruitt, K. D.
  • Nucleic Acids Research, Vol. 33, Issue Database issue
  • DOI: 10.1093/nar/gki025

Oligotyping: differentiating between closely related microbial taxa using 16S rRNA gene data
journal, October 2013

  • Eren, A. Murat; Maignien, Loïs; Sul, Woo Jun
  • Methods in Ecology and Evolution, Vol. 4, Issue 12
  • DOI: 10.1111/2041-210X.12114

Diversity of 16S rRNA Genes within Individual Prokaryotic Genomes
journal, April 2010

  • Pei, A. Y.; Oberdorf, W. E.; Nossa, C. W.
  • Applied and Environmental Microbiology, Vol. 76, Issue 12
  • DOI: 10.1128/AEM.02953-09

Genotypic Diversity Within a Natural Coastal Bacterioplankton Population
journal, February 2005


Towards a taxonomic coherence between average nucleotide identity and 16S rRNA gene sequence similarity for species demarcation of prokaryotes
journal, February 2014

  • Kim, M.; Oh, H. -S.; Park, S. -C.
  • INTERNATIONAL JOURNAL OF SYSTEMATIC AND EVOLUTIONARY MICROBIOLOGY, Vol. 64, Issue Pt 2
  • DOI: 10.1099/ijs.0.059774-0

Unusual biology across a group comprising more than 15% of domain Bacteria
journal, June 2015

  • Brown, Christopher T.; Hug, Laura A.; Thomas, Brian C.
  • Nature, Vol. 523, Issue 7559
  • DOI: 10.1038/nature14486

Trait-based approaches for understanding microbial biodiversity and ecosystem functioning
journal, May 2014


Estimating prokaryotic diversity and its limits
journal, July 2002

  • Curtis, T. P.; Sloan, W. T.; Scannell, J. W.
  • Proceedings of the National Academy of Sciences, Vol. 99, Issue 16, p. 10494-10499
  • DOI: 10.1073/pnas.142680199

Objective Criteria for the Evaluation of Clustering Methods
journal, December 1971


BEDTools: a flexible suite of utilities for comparing genomic features
journal, January 2010


The species concept for prokaryotes
journal, January 2001


V-Xtractor: An open-source, high-throughput software tool to identify and extract hypervariable regions of small subunit (16S/18S) ribosomal RNA gene sequences
journal, November 2010

  • Hartmann, Martin; Howes, Charles G.; Abarenkov, Kessy
  • Journal of Microbiological Methods, Vol. 83, Issue 2
  • DOI: 10.1016/j.mimet.2010.08.008

Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities
journal, October 2009

  • Schloss, P. D.; Westcott, S. L.; Ryabin, T.
  • Applied and Environmental Microbiology, Vol. 75, Issue 23, p. 7537-7541
  • DOI: 10.1128/AEM.01541-09

Complete Genome Sequence of Borrelia afzelii K78 and Comparative Genome Analysis
journal, March 2015


Microbiomes in light of traits: A phylogenetic perspective
journal, November 2015


QIIME allows analysis of high-throughput community sequencing data
journal, April 2010

  • Caporaso, J. Gregory; Kuczynski, Justin; Stombaugh, Jesse
  • Nature Methods, Vol. 7, Issue 5
  • DOI: 10.1038/nmeth.f.303

The enveomics collection: a toolbox for specialized analyses of microbial genomes and metagenomes
preprint, March 2016


Analysis of Ten Brucella Genomes Reveals Evidence for Horizontal Gene Transfer Despite a Preferred Intracellular Lifestyle
journal, April 2009

  • Wattam, A. R.; Williams, K. P.; Snyder, E. E.
  • Journal of Bacteriology, Vol. 191, Issue 11
  • DOI: 10.1128/JB.01767-08

Comparing partitions
journal, December 1985

  • Hubert, Lawrence; Arabie, Phipps
  • Journal of Classification, Vol. 2, Issue 1
  • DOI: 10.1007/BF01908075

Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life
journal, September 2017

  • Parks, Donovan H.; Rinke, Christian; Chuvochina, Maria
  • Nature Microbiology, Vol. 2, Issue 11
  • DOI: 10.1038/s41564-017-0012-7

Uniting the classification of cultured and uncultured bacteria and archaea using 16S rRNA gene sequences
journal, August 2014

  • Yarza, Pablo; Yilmaz, Pelin; Pruesse, Elmar
  • Nature Reviews Microbiology, Vol. 12, Issue 9
  • DOI: 10.1038/nrmicro3330

Recombination and the Nature of Bacterial Speciation
journal, January 2007


Scaling laws predict global microbial diversity
journal, May 2016

  • Locey, Kenneth J.; Lennon, Jay T.
  • Proceedings of the National Academy of Sciences, Vol. 113, Issue 21
  • DOI: 10.1073/pnas.1521291113

Microbial species delineation using whole genome sequences
journal, July 2015

  • Varghese, Neha J.; Mukherjee, Supratim; Ivanova, Natalia
  • Nucleic Acids Research, Vol. 43, Issue 14
  • DOI: 10.1093/nar/gkv657

The metagenomics RAST server – a public resource for the automatic phylogenetic and functional analysis of metagenomes
journal, September 2008


Insights into the phylogeny and coding potential of microbial dark matter
journal, July 2013

  • Rinke, Christian; Schwientek, Patrick; Sczyrba, Alexander
  • Nature, Vol. 499, Issue 7459
  • DOI: 10.1038/nature12352

EMBOSS: The European Molecular Biology Open Software Suite
journal, June 2000


DNA–DNA hybridization values and their relationship to whole-genome sequence similarities
journal, January 2007

  • Klappenbach, Joel A.; Goris, Johan; Vandamme, Peter
  • International Journal of Systematic and Evolutionary Microbiology, Vol. 57, Issue 1
  • DOI: 10.1099/ijs.0.64483-0

Status of the Archaeal and Bacterial Census: an Update
journal, May 2016


Bacterial species may exist, metagenomics reveal: Bacterial species may exist
journal, December 2011


Genomic Insights into a New Citrobacter koseri Strain Revealed Gene Exchanges with the Virulence-Associated Yersinia pestis pPCP1 Plasmid
journal, March 2016


    Figures/Tables have been extracted from DOE-funded journal article accepted manuscripts.