DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Effects of error, chimera, bias, and GC content on the accuracy of amplicon sequencing

Abstract

ABSTRACT Targeted amplicon sequencing is widely used in microbial ecology studies. However, sequencing artifacts and amplification biases are of great concern. To identify sources of these artifacts, a systematic analysis was performed using mock communities comprised of 16S rRNA genes from 33 bacterial strains. Our results indicated that while sequencing errors were generally isolated to low-abundance operational taxonomic units, chimeric sequences were a major source of artifacts. Singleton and doubleton sequences were primarily chimeras. Formation of chimeric sequences was significantly correlated with the GC content of the targeted sequences. Low-GC-content mock community members exhibited lower rates of chimeric sequence formation. GC content also had a large impact on sequence recovery. The quantitative capacity was notably limited, with substantial recovery variations and weak correlation between anticipated and observed strain abundances. The mock community strains with higher GC content had higher recovery rates than strains with lower GC content. Amplification bias was also observed due to the differences in primer affinity. A two-step PCR strategy reduced the number of chimeric sequences by half. In addition, comparative analyses based on the mock communities showed that several widely used sequence processing pipelines/methods, including DADA2, Deblur, UCLUST, UNOISE, and UPARSE, had different advantages and disadvantagesmore » in artifact removal and rare species detection. These results are important for improving sequencing quality and reliability and developing new algorithms to process targeted amplicon sequences. IMPORTANCE Amplicon sequencing of targeted genes is the predominant approach to estimate the membership and structure of microbial communities. However, accurate reconstruction of community composition is difficult due to sequencing errors, and other methodological biases and effective approaches to overcome these challenges are essential. Using a mock community of 33 phylogenetically diverse strains, this study evaluated the effect of GC content on sequencing results and tested different approaches to improve overall sequencing accuracy while characterizing the pros and cons of popular amplicon sequence data processing approaches. The sequencing results from this study can serve as a benchmarking data set for future algorithmic improvements. Furthermore, the new insights on sequencing error, chimera formation, and GC bias from this study will help enhance the quality of amplicon sequencing studies and support the development of new data analysis approaches.« less

Authors:
 [1]; ORCiD logo [1];  [1];  [2];  [1];  [1];  [3]; ORCiD logo [4]; ORCiD logo [5];
  1. Department of Microbiology and Plant Biology, Institute for Environmental Genomics, University of Oklahoma, Norman, Oklahoma, USA
  2. Department of Microbiology and Plant Biology, Institute for Environmental Genomics, University of Oklahoma, Norman, Oklahoma, USA, Fisheries College, Guangdong Ocean University, Zhanjiang, Guangdong, China
  3. Department of Civil and Environmental Engineering, University of Michigan, Ann Arbor, Michigan, USA
  4. School of Civil and Environmental Engineering, Georgia Institute of Technology, Atlanta, Georgia, USA
  5. Department of Microbiology and Plant Biology, Institute for Environmental Genomics, University of Oklahoma, Norman, Oklahoma, USA, State Key Joint Laboratory of Environment Simulation and Pollution Control, School of Environment, Tsinghua University, Beijing, China, Earth Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA, School of Civil Engineering and Environmental Sciences, University of Oklahoma, Norman, Oklahoma, USA, School of Computer Science, University of Oklahoma, Norman, Oklahoma, USA
Publication Date:
Sponsoring Org.:
USDOE
OSTI Identifier:
2228371
Grant/Contract Number:  
SC0014079, DE-SC0004601, DE-SC0010715, DE-SC0014079; AC02-05CH11231
Resource Type:
Published Article
Journal Name:
mSystems
Additional Journal Information:
Journal Name: mSystems Journal Volume: 8 Journal Issue: 6; Journal ID: ISSN 2379-5077
Publisher:
American Society for Microbiology
Country of Publication:
United States
Language:
English

Citation Formats

Qin, Yujia, Wu, Liyou, Zhang, Qiuting, Wen, Chongqin, Van Nostrand, Joy D., Ning, Daliang, Raskin, Lutgarde, Pinto, Ameet, Zhou, Jizhong, and Gilbert, ed., Jack A. Effects of error, chimera, bias, and GC content on the accuracy of amplicon sequencing. United States: N. p., 2023. Web. doi:10.1128/msystems.01025-23.
Qin, Yujia, Wu, Liyou, Zhang, Qiuting, Wen, Chongqin, Van Nostrand, Joy D., Ning, Daliang, Raskin, Lutgarde, Pinto, Ameet, Zhou, Jizhong, & Gilbert, ed., Jack A. Effects of error, chimera, bias, and GC content on the accuracy of amplicon sequencing. United States. https://doi.org/10.1128/msystems.01025-23
Qin, Yujia, Wu, Liyou, Zhang, Qiuting, Wen, Chongqin, Van Nostrand, Joy D., Ning, Daliang, Raskin, Lutgarde, Pinto, Ameet, Zhou, Jizhong, and Gilbert, ed., Jack A. Thu . "Effects of error, chimera, bias, and GC content on the accuracy of amplicon sequencing". United States. https://doi.org/10.1128/msystems.01025-23.
@article{osti_2228371,
title = {Effects of error, chimera, bias, and GC content on the accuracy of amplicon sequencing},
author = {Qin, Yujia and Wu, Liyou and Zhang, Qiuting and Wen, Chongqin and Van Nostrand, Joy D. and Ning, Daliang and Raskin, Lutgarde and Pinto, Ameet and Zhou, Jizhong and Gilbert, ed., Jack A.},
abstractNote = {ABSTRACT Targeted amplicon sequencing is widely used in microbial ecology studies. However, sequencing artifacts and amplification biases are of great concern. To identify sources of these artifacts, a systematic analysis was performed using mock communities comprised of 16S rRNA genes from 33 bacterial strains. Our results indicated that while sequencing errors were generally isolated to low-abundance operational taxonomic units, chimeric sequences were a major source of artifacts. Singleton and doubleton sequences were primarily chimeras. Formation of chimeric sequences was significantly correlated with the GC content of the targeted sequences. Low-GC-content mock community members exhibited lower rates of chimeric sequence formation. GC content also had a large impact on sequence recovery. The quantitative capacity was notably limited, with substantial recovery variations and weak correlation between anticipated and observed strain abundances. The mock community strains with higher GC content had higher recovery rates than strains with lower GC content. Amplification bias was also observed due to the differences in primer affinity. A two-step PCR strategy reduced the number of chimeric sequences by half. In addition, comparative analyses based on the mock communities showed that several widely used sequence processing pipelines/methods, including DADA2, Deblur, UCLUST, UNOISE, and UPARSE, had different advantages and disadvantages in artifact removal and rare species detection. These results are important for improving sequencing quality and reliability and developing new algorithms to process targeted amplicon sequences. IMPORTANCE Amplicon sequencing of targeted genes is the predominant approach to estimate the membership and structure of microbial communities. However, accurate reconstruction of community composition is difficult due to sequencing errors, and other methodological biases and effective approaches to overcome these challenges are essential. Using a mock community of 33 phylogenetically diverse strains, this study evaluated the effect of GC content on sequencing results and tested different approaches to improve overall sequencing accuracy while characterizing the pros and cons of popular amplicon sequence data processing approaches. The sequencing results from this study can serve as a benchmarking data set for future algorithmic improvements. Furthermore, the new insights on sequencing error, chimera formation, and GC bias from this study will help enhance the quality of amplicon sequencing studies and support the development of new data analysis approaches.},
doi = {10.1128/msystems.01025-23},
journal = {mSystems},
number = 6,
volume = 8,
place = {United States},
year = {Thu Dec 21 00:00:00 EST 2023},
month = {Thu Dec 21 00:00:00 EST 2023}
}

Works referenced in this record:

Unearthing microbial diversity of Taxus rhizosphere via MiSeq high-throughput amplicon sequencing and isolate characterization
journal, April 2016

  • Hao, Da Cheng; Song, Si Meng; Mu, Jun
  • Scientific Reports, Vol. 6, Issue 1
  • DOI: 10.1038/srep22006

Drivers of yeast community composition in the litter and soil of a temperate forest
journal, October 2016

  • Mašínová, Tereza; Bahnmann, Barbara Doreen; Větrovský, Tomáš
  • FEMS Microbiology Ecology, Vol. 93, Issue 2
  • DOI: 10.1093/femsec/fiw223

High sensitivity of 454 pyrosequencing for detection of rare species in aquatic communities
journal, February 2013

  • Zhan, Aibin; Hulák, Martin; Sylvester, Francisco
  • Methods in Ecology and Evolution, Vol. 4, Issue 6
  • DOI: 10.1111/2041-210X.12037

Endophytic bacterial communities in in vitro shoot cultures derived from embryonic tissue of hybrid walnut (Juglans × intermedia)
journal, April 2017

  • Pham, Ngoc Tuan; Meier-Dinkel, Andreas; Höltken, Aki M.
  • Plant Cell, Tissue and Organ Culture (PCTOC), Vol. 130, Issue 1
  • DOI: 10.1007/s11240-017-1211-x

Illumina error profiles: resolving fine-scale variation in metagenomic sequencing data
journal, March 2016


Microbiome Profiling by Illumina Sequencing of Combinatorial Sequence-Tagged PCR Products
journal, October 2010


Microbial diversity in the deep sea and the underexplored "rare biosphere"
journal, July 2006

  • Sogin, M. L.; Morrison, H. G.; Huber, J. A.
  • Proceedings of the National Academy of Sciences, Vol. 103, Issue 32
  • DOI: 10.1073/pnas.0605127103

Habitat heterogeneity and connectivity shape microbial communities in South American peatlands
journal, May 2016

  • Oloo, Felix; Valverde, Angel; Quiroga, María Victoria
  • Scientific Reports, Vol. 6, Issue 1
  • DOI: 10.1038/srep25712

UCHIME2: improved chimera prediction for amplicon sequencing
posted_content, January 2016


Greengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARB
journal, July 2006

  • DeSantis, T. Z.; Hugenholtz, P.; Larsen, N.
  • Applied and Environmental Microbiology, Vol. 72, Issue 7, p. 5069-5072
  • DOI: 10.1128/AEM.03006-05

Accuracy and quality of massively parallel DNA pyrosequencing
journal, January 2007


Random Sampling Process Leads to Overestimation of β-Diversity of Microbial Communities
journal, June 2013


Fungal Communities Respond to Long-Term CO 2 Elevation by Community Reassembly
journal, January 2015

  • Tu, Qichao; Yuan, Mengting; He, Zhili
  • Applied and Environmental Microbiology, Vol. 81, Issue 7
  • DOI: 10.1128/AEM.04040-14

The 'rare biosphere': a reality check
journal, September 2009


Long-Term Warming in Alaska Enlarges the Diazotrophic Community in Deep Soils
journal, February 2019


Evaluation of PCR-Generated Chimeras, Mutations, and Heteroduplexes with 16S rRNA Gene-Based Cloning
journal, February 2001


High-Throughput Metagenomic Technologies for Complex Microbial Community Analysis: Open and Closed Formats
journal, January 2015


Development of a Dual-Index Sequencing Strategy and Curation Pipeline for Analyzing Amplicon Sequence Data on the MiSeq Illumina Sequencing Platform
journal, June 2013

  • Kozich, James J.; Westcott, Sarah L.; Baxter, Nielson T.
  • Applied and Environmental Microbiology, Vol. 79, Issue 17
  • DOI: 10.1128/AEM.01043-13

Reproducibility of pyrosequencing data for biodiversity assessment in complex communities
journal, July 2014

  • Zhan, Aibin; He, Song; Brown, Emily A.
  • Methods in Ecology and Evolution, Vol. 5, Issue 9
  • DOI: 10.1111/2041-210X.12230

Reproducibility and quantitation of amplicon sequencing-based detection
journal, February 2011


Metagenomic analysis of fungal taxa inhabiting Mecca region, Saudi Arabia
journal, September 2016


Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample
journal, June 2010

  • Caporaso, J. G.; Lauber, C. L.; Walters, W. A.
  • Proceedings of the National Academy of Sciences, Vol. 108, Issue Supplement_1
  • DOI: 10.1073/pnas.1000080107

Nearly a decade-long repeatable seasonal diversity patterns of bacterioplankton communities in the eutrophic Lake Donghu (Wuhan, China)
journal, May 2017

  • Yan, Qingyun; Stegen, James C.; Yu, Yuhe
  • Molecular Ecology, Vol. 26, Issue 14
  • DOI: 10.1111/mec.14151

Large Scale Loss of Data in Low-Diversity Illumina Sequencing Libraries Can Be Recovered by Deferred Cluster Calling
journal, January 2011


Evaluation of the reproducibility of amplicon sequencing with Illumina MiSeq platform
journal, April 2017


Comparison of direct boiling method with commercial kits for extracting fecal microbiome DNA by Illumina sequencing of 16S rRNA tags
journal, December 2013


Bacterial dynamics and metabolite changes in solid-state acetic acid fermentation of Shanxi aged vinegar
journal, January 2016


Rare biosphere exploration using high-throughput sequencing: research progress and perspectives
journal, November 2014


Low sequencing efforts bias analyses of shared taxa in microbial communities
journal, May 2012

  • Lemos, Leandro N.; Fulthorpe, Roberta R.; Roesch, Luiz F. W.
  • Folia Microbiologica, Vol. 57, Issue 5
  • DOI: 10.1007/s12223-012-0155-0

Reagent and laboratory contamination can critically impact sequence-based microbiome analyses
journal, November 2014


The Impact of DNA Polymerase and Number of Rounds of Amplification in PCR on 16S rRNA Gene Sequence Data
journal, May 2019


Sequence-specific error profile of Illumina sequencers
journal, May 2011

  • Nakamura, Kensuke; Oshima, Taku; Morimoto, Takuya
  • Nucleic Acids Research, Vol. 39, Issue 13
  • DOI: 10.1093/nar/gkr344

Fungi Sailing the Arctic Ocean: Speciose Communities in North Atlantic Driftwood as Revealed by High-Throughput Amplicon Sequencing
journal, May 2016


FLASH: fast length adjustment of short reads to improve genome assemblies
journal, September 2011


Phasing amplicon sequencing on Illumina Miseq for robust environmental microbial community analysis
journal, June 2015


Soil biochar amendment affects the diversity of nosZ transcripts: Implications for N2O formation
journal, June 2017


NifH-Harboring Bacterial Community Composition across an Alaskan Permafrost Thaw Gradient
journal, November 2016


A general framework for quantitatively assessing ecological stochasticity
journal, August 2019

  • Ning, Daliang; Deng, Ye; Tiedje, James M.
  • Proceedings of the National Academy of Sciences, Vol. 116, Issue 34
  • DOI: 10.1073/pnas.1904623116

UPARSE: highly accurate OTU sequences from microbial amplicon reads
journal, August 2013


Simultaneous assessment of the macrobiome and microbiome in a bulk sample of tropical arthropods through DNA metasystematics
journal, May 2014

  • Gibson, J.; Shokralla, S.; Porter, T. M.
  • Proceedings of the National Academy of Sciences, Vol. 111, Issue 22
  • DOI: 10.1073/pnas.1406468111

Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform
journal, January 2015

  • Schirmer, Melanie; Ijaz, Umer Z.; D'Amore, Rosalinda
  • Nucleic Acids Research, Vol. 43, Issue 6
  • DOI: 10.1093/nar/gku1341

Error filtering, pair assembly and error correction for next-generation sequencing reads
journal, July 2015


Dynamic Succession of Groundwater Sulfate-Reducing Communities during Prolonged Reduction of Uranium in a Contaminated Aquifer
journal, March 2017

  • Zhang, Ping; He, Zhili; Van Nostrand, Joy D.
  • Environmental Science & Technology, Vol. 51, Issue 7
  • DOI: 10.1021/acs.est.6b02980

Analysis, Optimization and Verification of Illumina-Generated 16S rRNA Gene Amplicon Surveys
journal, April 2014


Generation of Multimillion-Sequence 16S rRNA Gene Libraries from Complex Microbial Communities by Assembling Paired-End Illumina Reads
journal, April 2011

  • Bartram, Andrea K.; Lynch, Michael D. J.; Stearns, Jennifer C.
  • Applied and Environmental Microbiology, Vol. 77, Issue 11
  • DOI: 10.1128/AEM.02772-10

Estimating the Population Size for Capture-Recapture Data with Unequal Catchability
journal, December 1987


Accurate determination of microbial diversity from 454 pyrosequencing data
journal, August 2009

  • Quince, Christopher; Lanzén, Anders; Curtis, Thomas P.
  • Nature Methods, Vol. 6, Issue 9
  • DOI: 10.1038/nmeth.1361

PCR Biases Distort Bacterial and Archaeal Community Structure in Pyrosequencing Datasets
journal, August 2012


Illumina MiSeq 16S amplicon sequence analysis of bovine respiratory disease associated bacteria in lung and mediastinal lymph node tissue
journal, May 2017

  • Johnston, Dayle; Earley, Bernadette; Cormican, Paul
  • BMC Veterinary Research, Vol. 13, Issue 1
  • DOI: 10.1186/s12917-017-1035-2

Soil microbial community responses to contamination with silver, aluminium oxide and silicon dioxide nanoparticles
journal, February 2017


QIIME allows analysis of high-throughput community sequencing data
journal, April 2010

  • Caporaso, J. Gregory; Kuczynski, Justin; Stombaugh, Jesse
  • Nature Methods, Vol. 7, Issue 5
  • DOI: 10.1038/nmeth.f.303

Basic local alignment search tool
journal, October 1990

  • Altschul, Stephen F.; Gish, Warren; Miller, Webb
  • Journal of Molecular Biology, Vol. 215, Issue 3, p. 403-410
  • DOI: 10.1016/S0022-2836(05)80360-2

Impact of a Glyphosate-Tolerant Soybean Line on the Rhizobacteria, Revealed by Illumina MiSeq
journal, March 2017

  • Lu, Gui-Hua; Zhu, Yin-Ling; Kong, Ling-Ru
  • Journal of Microbiology and Biotechnology, Vol. 27, Issue 3
  • DOI: 10.4014/jmb.1609.09008

DADA2: High-resolution sample inference from Illumina amplicon data
journal, May 2016

  • Callahan, Benjamin J.; McMurdie, Paul J.; Rosen, Michael J.
  • Nature Methods, Vol. 13, Issue 7
  • DOI: 10.1038/nmeth.3869

Illumina-based analysis of microbial community diversity
journal, June 2011


High-Throughput Sequencing: A Roadmap Toward Community Ecology
journal, March 2013

  • Poisot, Timothée; Péquin, Bérangère; Gravel, Dominique
  • Ecology and Evolution, Vol. 3, Issue 4
  • DOI: 10.1002/ece3.508

Molecular diversity patterns among various phytoplankton size-fractions in West Greenland in late summer
journal, March 2017

  • Elferink, Stephanie; Neuhaus, Stefan; Wohlrab, Sylke
  • Deep Sea Research Part I: Oceanographic Research Papers, Vol. 121
  • DOI: 10.1016/j.dsr.2016.11.002

Kinetic selection vs. free energy of DNA base pairing in control of polymerase fidelity
journal, April 2016

  • Oertell, Keriann; Harcourt, Emily M.; Mohsen, Michael G.
  • Proceedings of the National Academy of Sciences, Vol. 113, Issue 16
  • DOI: 10.1073/pnas.1600279113

Genome sequencing in microfabricated high-density picolitre reactors
journal, July 2005

  • Margulies, Marcel; Egholm, Michael; Altman, William E.
  • Nature, Vol. 437, Issue 7057, p. 376-380
  • DOI: 10.1038/nature03959

Diversity and Activity of Diazotrophs in Great Barrier Reef Surface Waters
journal, June 2017


Soil DNA Extraction Procedure Influences Protist 18S rRNA Gene Community Profiling Outcome
journal, July 2017


Temperature determines the diversity and structure of N 2 O-reducing microbial assemblages
journal, March 2018


Switchgrass rhizospheres stimulate microbial biomass but deplete microbial necromass in agricultural soils of the upper Midwest, USA
journal, March 2016


Digital MDA for enumeration of total nucleic acid contamination
journal, November 2010

  • Blainey, P. C.; Quake, S. R.
  • Nucleic Acids Research, Vol. 39, Issue 4
  • DOI: 10.1093/nar/gkq1074

Performance Comparison of Illumina and Ion Torrent Next-Generation Sequencing Platforms for 16S rRNA-Based Bacterial Community Profiling
journal, September 2014

  • Salipante, Stephen J.; Kawashima, Toana; Rosenthal, Christopher
  • Applied and Environmental Microbiology, Vol. 80, Issue 24
  • DOI: 10.1128/AEM.02206-14

Ultra-high-throughput microbial community analysis on the Illumina HiSeq and MiSeq platforms
journal, March 2012

  • Caporaso, J. Gregory; Lauber, Christian L.; Walters, William A.
  • The ISME Journal, Vol. 6, Issue 8
  • DOI: 10.1038/ismej.2012.8

Scraping the bottom of the barrel: are rare high throughput sequences artifacts?
journal, February 2015


Bacterial and protist community changes during a phytoplankton bloom
journal, October 2015

  • Pearman, John K.; Casas, Laura; Merle, Tony
  • Limnology and Oceanography, Vol. 61, Issue 1
  • DOI: 10.1002/lno.10212

Marine microbial diversity: can it be determined?
journal, June 2006


The Shifts of Diazotrophic Communities in Spring and Summer Associated with Coral Galaxea astreata, Pavona decussata, and Porites lutea
journal, November 2016


Wind drives microbial eukaryote communities in a temperate closed lagoon
journal, March 2017

  • Péquin, B.; Mohit, V.; Poisot, T.
  • Aquatic Microbial Ecology, Vol. 78, Issue 3
  • DOI: 10.3354/ame01814

Metagenomic study of the oral microbiota by Illumina high-throughput sequencing
journal, December 2009

  • Lazarevic, Vladimir; Whiteson, Katrine; Huse, Susan
  • Journal of Microbiological Methods, Vol. 79, Issue 3
  • DOI: 10.1016/j.mimet.2009.09.012