skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: How Much Do rRNA Gene Surveys Underestimate Extant Bacterial Diversity?

Abstract

The most common practice in studying and cataloguing prokaryotic diversity involves the grouping of sequences into operational taxonomic units (OTUs) at the 97% 16S rRNA gene sequence identity level, often using partial gene sequences, such as PCR-generated amplicons. Due to the high sequence conservation of rRNA genes, organisms belonging to closely related yet distinct species may be grouped under the same OTU. With this being said, it remains unclear how much diversity has been underestimated by this practice. To address this question, we compared the OTUs of genomes defined at the 97% or 98.5% 16S rRNA gene identity level against OTUs of the same genomes defined at the 95% whole-genome average nucleotide identity (ANI), which is a much more accurate proxy for species. Our results show that OTUs resulting from a 98.5% 16S rRNA gene identity cutoff are more accurate than 97% compared to 95% ANI (90.5% versus 89.9% accuracy) but indistinguishable from any other threshold in the 98.29 to 98.78% range. Even with the more stringent thresholds, the 16S rRNA gene-based approach commonly underestimates the number of OTUs by ~12%, on average, compared to the ANI-based approach (~14% underestimation when using the 97% identity threshold). Moreover, the degree ofmore » underestimation can become 50% or more for certain taxa, such as the genera Pseudomonas, Burkholderia, Escherichia, Campylobacter, and Citrobacter. These results provide a quantitative view of the degree of underestimation of extant prokaryotic diversity by 16S rRNA gene-defined OTUs and suggest that genomic resolution is often necessary. IMPORTANCE: Species diversity is one of the most fundamental pieces of information for community ecology and conservational biology. Thus, employing accurate proxies for what a species or the unit of diversity is are cornerstones for a large set of microbial ecology and diversity studies. The most common proxies currently used rely on the clustering of 16S rRNA gene sequences at some threshold of nucleotide identity, typically 97% or 98.5%. Here, we explore how well this strategy reflects the more accurate whole-genome-based proxies and determine the frequency with which the high conservation of 16S rRNA sequences masks substantial species-level diversity.« less

Authors:
ORCiD logo [1];  [1];  [2];  [3];  [3];  [1];  [4]
  1. Georgia Inst. of Technology, Atlanta, GA (United States)
  2. USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States)
  3. Michigan State Univ., East Lansing, MI (United States)
  4. Univ. of Tennessee, Knoxville, TN (United States); Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Publication Date:
Research Org.:
Michigan State Univ., East Lansing, MI (United States); Univ. of California, Berkeley, CA (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Basic Energy Sciences (BES) (SC-22). Scientific User Facilities Division
OSTI Identifier:
1503621
Grant/Contract Number:  
FG02-99ER62848
Resource Type:
Journal Article: Accepted Manuscript
Journal Name:
Applied and Environmental Microbiology
Additional Journal Information:
Journal Volume: 84; Journal Issue: 6; Journal ID: ISSN 0099-2240
Publisher:
American Society for Microbiology
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; average nucleotide identity; diversity; 16S rRNA gene

Citation Formats

Rodriguez-R, Luis M., Castro, Juan C., Kyrpides, Nikos C., Cole, James R., Tiedje, James M., Konstantinidis, Konstantinos T., and Löffler, Frank E.. How Much Do rRNA Gene Surveys Underestimate Extant Bacterial Diversity?. United States: N. p., 2018. Web. doi:10.1128/aem.00014-18.
Rodriguez-R, Luis M., Castro, Juan C., Kyrpides, Nikos C., Cole, James R., Tiedje, James M., Konstantinidis, Konstantinos T., & Löffler, Frank E.. How Much Do rRNA Gene Surveys Underestimate Extant Bacterial Diversity?. United States. doi:10.1128/aem.00014-18.
Rodriguez-R, Luis M., Castro, Juan C., Kyrpides, Nikos C., Cole, James R., Tiedje, James M., Konstantinidis, Konstantinos T., and Löffler, Frank E.. Fri . "How Much Do rRNA Gene Surveys Underestimate Extant Bacterial Diversity?". United States. doi:10.1128/aem.00014-18. https://www.osti.gov/servlets/purl/1503621.
@article{osti_1503621,
title = {How Much Do rRNA Gene Surveys Underestimate Extant Bacterial Diversity?},
author = {Rodriguez-R, Luis M. and Castro, Juan C. and Kyrpides, Nikos C. and Cole, James R. and Tiedje, James M. and Konstantinidis, Konstantinos T. and Löffler, Frank E.},
abstractNote = {The most common practice in studying and cataloguing prokaryotic diversity involves the grouping of sequences into operational taxonomic units (OTUs) at the 97% 16S rRNA gene sequence identity level, often using partial gene sequences, such as PCR-generated amplicons. Due to the high sequence conservation of rRNA genes, organisms belonging to closely related yet distinct species may be grouped under the same OTU. With this being said, it remains unclear how much diversity has been underestimated by this practice. To address this question, we compared the OTUs of genomes defined at the 97% or 98.5% 16S rRNA gene identity level against OTUs of the same genomes defined at the 95% whole-genome average nucleotide identity (ANI), which is a much more accurate proxy for species. Our results show that OTUs resulting from a 98.5% 16S rRNA gene identity cutoff are more accurate than 97% compared to 95% ANI (90.5% versus 89.9% accuracy) but indistinguishable from any other threshold in the 98.29 to 98.78% range. Even with the more stringent thresholds, the 16S rRNA gene-based approach commonly underestimates the number of OTUs by ~12%, on average, compared to the ANI-based approach (~14% underestimation when using the 97% identity threshold). Moreover, the degree of underestimation can become 50% or more for certain taxa, such as the genera Pseudomonas, Burkholderia, Escherichia, Campylobacter, and Citrobacter. These results provide a quantitative view of the degree of underestimation of extant prokaryotic diversity by 16S rRNA gene-defined OTUs and suggest that genomic resolution is often necessary. IMPORTANCE: Species diversity is one of the most fundamental pieces of information for community ecology and conservational biology. Thus, employing accurate proxies for what a species or the unit of diversity is are cornerstones for a large set of microbial ecology and diversity studies. The most common proxies currently used rely on the clustering of 16S rRNA gene sequences at some threshold of nucleotide identity, typically 97% or 98.5%. Here, we explore how well this strategy reflects the more accurate whole-genome-based proxies and determine the frequency with which the high conservation of 16S rRNA sequences masks substantial species-level diversity.},
doi = {10.1128/aem.00014-18},
journal = {Applied and Environmental Microbiology},
issn = {0099-2240},
number = 6,
volume = 84,
place = {United States},
year = {2018},
month = {1}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Save / Share:

Works referenced in this record:

Estimating prokaryotic diversity and its limits
journal, July 2002

  • Curtis, T. P.; Sloan, W. T.; Scannell, J. W.
  • Proceedings of the National Academy of Sciences, Vol. 99, Issue 16, p. 10494-10499
  • DOI: 10.1073/pnas.142680199

Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities
journal, October 2009

  • Schloss, P. D.; Westcott, S. L.; Ryabin, T.
  • Applied and Environmental Microbiology, Vol. 75, Issue 23, p. 7537-7541
  • DOI: 10.1128/AEM.01541-09

The enveomics collection: a toolbox for specialized analyses of microbial genomes and metagenomes
preprint, March 2016