Impact of BBDuk metagenomic read trimming and decontamination
Abstract
Background Investigators using metagenomic sequencing to study their microbiomes are often provided data that has been trimmed and decontaminated or do it themselves without knowing the effect these procedures can have on their downstream analyses. Here we evaluated the impact that JGI trimming and decontamination procedures had on assembly and binning metrics, placement of metagenome assembled genomes into species trees, and functional profiles of metagenome-assembled genomes (MAGs) extracted from twenty three complex rhizosphere metagenomes. We also investigated how more aggressive trimming impacts these binning metrics. Results We found that JGI trimmed and decontamination of input reads had some significant impacts in assembly and binning metrics compared to raw reads, and that differences in placement of MAGs in species trees increased with decreasing completeness and contamination thresholds. More aggressive trimming beyond those used by JGI were found to reduce MAG counts. Conclusions Mild trimming and decontamination of metagenomics reads prior to assembly can change an investigator’s answer to the questions, “Who is there and what are they doing? However, mild trimming and decontamination of metagenomic reads with high quality scores is recommended for those who elect to do so.
- Authors:
-
- North Carolina State Univ., Raleigh, NC (United States)
- Publication Date:
- Research Org.:
- North Carolina State University, Raleigh, NC (United States)
- Sponsoring Org.:
- USDOE Office of Science (SC), Biological and Environmental Research (BER)
- Subject:
- 59 BASIC BIOLOGICAL SCIENCES
- Keywords:
- metagenomics, decontamination, assembly, binning, phylogenomics, functional analysis
- OSTI Identifier:
- 1779218
- DOI:
- https://doi.org/10.25982/77705.1341/1779218
Citation Formats
Whitham, Jason. Impact of BBDuk metagenomic read trimming and decontamination. United States: N. p., 2021.
Web. doi:10.25982/77705.1341/1779218.
Whitham, Jason. Impact of BBDuk metagenomic read trimming and decontamination. United States. doi:https://doi.org/10.25982/77705.1341/1779218
Whitham, Jason. 2021.
"Impact of BBDuk metagenomic read trimming and decontamination". United States. doi:https://doi.org/10.25982/77705.1341/1779218. https://www.osti.gov/servlets/purl/1779218. Pub date:Fri Jan 01 00:00:00 EST 2021
@article{osti_1779218,
title = {Impact of BBDuk metagenomic read trimming and decontamination},
author = {Whitham, Jason},
abstractNote = {Background Investigators using metagenomic sequencing to study their microbiomes are often provided data that has been trimmed and decontaminated or do it themselves without knowing the effect these procedures can have on their downstream analyses. Here we evaluated the impact that JGI trimming and decontamination procedures had on assembly and binning metrics, placement of metagenome assembled genomes into species trees, and functional profiles of metagenome-assembled genomes (MAGs) extracted from twenty three complex rhizosphere metagenomes. We also investigated how more aggressive trimming impacts these binning metrics. Results We found that JGI trimmed and decontamination of input reads had some significant impacts in assembly and binning metrics compared to raw reads, and that differences in placement of MAGs in species trees increased with decreasing completeness and contamination thresholds. More aggressive trimming beyond those used by JGI were found to reduce MAG counts. Conclusions Mild trimming and decontamination of metagenomics reads prior to assembly can change an investigator’s answer to the questions, “Who is there and what are they doing? However, mild trimming and decontamination of metagenomic reads with high quality scores is recommended for those who elect to do so.},
doi = {10.25982/77705.1341/1779218},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2021},
month = {1}
}
Works referenced in this record:
IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth
journal, April 2012
- Peng, Y.; Leung, H. C. M.; Yiu, S. M.
- Bioinformatics, Vol. 28, Issue 11
QUAST: quality assessment tool for genome assemblies
journal, February 2013
- Gurevich, Alexey; Saveliev, Vladislav; Vyahhi, Nikolay
- Bioinformatics, Vol. 29, Issue 8
HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks
journal, January 2018
- Azad, Ariful; Pavlopoulos, Georgios A.; Ouzounis, Christos A.
- Nucleic Acids Research, Vol. 46, Issue 6
Metagenomic analysis of the rhizosphere of three biofuel crops at the KBS intensive site
dataset, January 2013
- Tiedje, James
- DOE Joint Genome Institute
KBase: The United States Department of Energy Systems Biology Knowledgebase
journal, July 2018
- Arkin, Adam P.; Cottingham, Robert W.; Henry, Christopher S.
- Nature Biotechnology, Vol. 36, Issue 7
Icarus: visualizer for de novo assembly evaluation
journal, July 2016
- Mikheenko, Alla; Valin, Gleb; Prjibelski, Andrey
- Bioinformatics, Vol. 32, Issue 21
CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes
journal, May 2015
- Parks, Donovan H.; Imelfort, Michael; Skennerton, Connor T.
- Genome Research, Vol. 25, Issue 7
Using SPAdes De Novo Assembler
journal, June 2020
- Prjibelski, Andrey; Antipov, Dmitry; Meleshko, Dmitry
- Current Protocols in Bioinformatics, Vol. 70, Issue 1
Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea
journal, August 2017
- Bowers, Robert M.; Kyrpides, Nikos C.; Stepanauskas, Ramunas
- Nature Biotechnology, Vol. 35, Issue 8
The IMG/M data management and analysis system v.6.0: new tools and advanced capabilities
journal, October 2020
- Chen, I-Min A.; Chu, Ken; Palaniappan, Krishnaveni
- Nucleic Acids Research, Vol. 49, Issue D1
Pfam: The protein families database in 2021
journal, October 2020
- Mistry, Jaina; Chuguransky, Sara; Williams, Lowri
- Nucleic Acids Research, Vol. 49, Issue D1
KBase Silver Case Study: Determining Media Formulation Requirements for Isolation of Microbiome Constituents
dataset, January 2021
- Whitham, Jason
- North Carolina State University
MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph
journal, January 2015
- Li, Dinghua; Liu, Chi-Man; Luo, Ruibang
- Bioinformatics, Vol. 31, Issue 10
Trace gas oxidizers are widespread and active members of soil microbial communities
journal, January 2021
- Bay, Sean K.; Dong, Xiyang; Bradley, James A.
- Nature Microbiology, Vol. 6, Issue 2
Jupyter Notebooks – a publishing format for reproducible computational workflows
book, January 2021
- Kluyver, Thomas; Ragan-Kelley, Benjamin; Perez, Fernando
- Positioning and Power in Academic Publishing: Players, Agents and Agendas
The Importance of Accounting for Correlated Observations
journal, September 2010
- Sainani, Kristin
- PM&R, Vol. 2, Issue 9
Evaluating metagenomics tools for genome binning with real metagenomic datasets and CAMI datasets
journal, July 2020
- Yue, Yi; Huang, Hao; Qi, Zhao
- BMC Bioinformatics, Vol. 21, Issue 1
ETE 3: Reconstruction, Analysis, and Visualization of Phylogenomic Data
journal, February 2016
- Huerta-Cepas, Jaime; Serra, François; Bork, Peer
- Molecular Biology and Evolution, Vol. 33, Issue 6
FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments
journal, March 2010
- Price, Morgan N.; Dehal, Paramvir S.; Arkin, Adam P.
- PLoS ONE, Vol. 5, Issue 3
Data Analysis Using Regression and Multilevel/Hierarchical Models
book, January 2006
- Gelman, Andrew; Hill, Jennifer
MetaQUAST: evaluation of metagenome assemblies
journal, November 2015
- Mikheenko, Alla; Saveliev, Vladislav; Gurevich, Alexey
- Bioinformatics, Vol. 32, Issue 7
MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities
journal, January 2015
- Kang, Dongwan D.; Froula, Jeff; Egan, Rob
- PeerJ, Vol. 3
TIGRFAMs: a protein family resource for the functional identification of proteins
journal, January 2001
- Haft, D. H.
- Nucleic Acids Research, Vol. 29, Issue 1
MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities
journal, January 2015
- Kang, Dongwan D.; Froula, Jeff; Egan, Rob
- PeerJ, Vol. 3
RASTtk: A modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes
journal, February 2015
- Brettin, Thomas; Davis, James J.; Disz, Terry
- Scientific Reports, Vol. 5, Issue 1
MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets
journal, October 2015
- Wu, Yu-Wei; Simmons, Blake A.; Singer, Steven W.
- Bioinformatics, Vol. 32, Issue 4
Microbial Community Analysis with Ribosomal Gene Fragments from Shotgun Metagenomes
journal, October 2015
- Guo, Jiarong; Cole, James R.; Zhang, Qingpeng
- Applied and Environmental Microbiology, Vol. 82, Issue 1
COG database update: focus on microbial diversity, model organisms, and widespread pathogens
journal, November 2020
- Galperin, Michael Y.; Wolf, Yuri I.; Makarova, Kira S.
- Nucleic Acids Research, Vol. 49, Issue D1
Genomes OnLine Database (GOLD) v.8: overview and updates
journal, November 2020
- Mukherjee, Supratim; Stamatis, Dimitri; Bertsch, Jon
- Nucleic Acids Research, Vol. 49, Issue D1
FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments
journal, March 2010
- Price, Morgan N.; Dehal, Paramvir S.; Arkin, Adam P.
- PLoS ONE, Vol. 5, Issue 3