Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Trimming and Decontamination of Metagenomic Data can Significantly Impact Assembly and Binning Metrics, Phylogenomic and Functional Analysis

Journal Article · · Current Bioinformatics
 [1];  [1]
  1. Department of Plant and Microbial Biology, North Carolina State University, 4550A Thomas Hall, Box 7615, Raleigh, 27695, NC, United States of America

Background:

Investigators using metagenomic sequencing to study microbiomes often trim and decontaminate reads without knowing their effect on downstream analyses.

Objective:

This study was designed to evaluate the impacts JGI trimming and decontamination procedures have on assembly and binning metrics, placement of MAGs into species trees, and functional profiles of MAGs extracted from complex rhizosphere metagenomes, as well as how more aggressive trimming impacts these binning metrics.

Methods:

Twenty-three Miscanthus x giganteus rhizosphere metagenomes were subjected to different combinations and thresholds of force, kmer, and quality trimming and decontamination using BBDuk. Reads were assembled and binned in KBase. Phylogenomic and statistical analyses were applied to evaluate the effects of trimming and decontamination on downstream analyses.

Results:

We found that JGI trimmed and decontaminated reads had significant impacts on assembly and binning metrics compared to raw reads, including significantly higher total contig counts, more contigs greater than 10k bp in length, and larger total lengths of raw assemblies compared to QC assemblies, and 2.0% lower average contamination of QC MAGs compared to raw MAGs. We also found that differences in the placement of MAGs in species trees increased with decreasing completeness and contamination thresholds. Furthermore, aggressive trimming (Q20) was found to significantly reduce MAG counts.

Conclusion:

Trimming and decontamination of metagenomics reads prior to assembly can change an investigator’s answer to the questions, “Who is there and what are they doing?” However, mild trimming and decontamination of metagenomic reads with high-quality scores are recommended for removing sample processing and sequencing artifacts.

Sponsoring Organization:
USDOE
Grant/Contract Number:
EE0008523
OSTI ID:
1992554
Journal Information:
Current Bioinformatics, Journal Name: Current Bioinformatics Journal Issue: 5 Vol. 18; ISSN 1574-8936
Publisher:
Bentham Science Publishers Ltd.Copyright Statement
Country of Publication:
Netherlands
Language:
English

References (32)

Using SPAdes De Novo Assembler journal June 2020
The Importance of Accounting for Correlated Observations journal September 2010
MEGAHIT v1.0: A fast and scalable metagenome assembler driven by advanced methodologies and community practices journal June 2016
Data Analysis Using Regression and Multilevel/Hierarchical Models book January 2006
Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea journal August 2017
KBase: The United States Department of Energy Systems Biology Knowledgebase journal July 2018
Trace gas oxidizers are widespread and active members of soil microbial communities journal January 2021
RASTtk: A modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes journal February 2015
IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth journal April 2012
QUAST: quality assessment tool for genome assemblies journal February 2013
MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets journal October 2015
MetaQUAST: evaluation of metagenome assemblies journal November 2015
ETE 3: Reconstruction, Analysis, and Visualization of Phylogenomic Data journal February 2016
TIGRFAMs: a protein family resource for the functional identification of proteins journal January 2001
COG database update: focus on microbial diversity, model organisms, and widespread pathogens journal November 2020
Pfam: The protein families database in 2021 journal October 2020
The IMG/M data management and analysis system v.6.0: new tools and advanced capabilities journal October 2020
Genomes OnLine Database (GOLD) v.8: overview and updates journal November 2020
HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks journal January 2018
CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes journal May 2015
Microbial Community Analysis with Ribosomal Gene Fragments from Shotgun Metagenomes journal October 2015
Evaluating metagenomics tools for genome binning with real metagenomic datasets and CAMI datasets journal July 2020
FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments journal March 2010
Metagenomic analysis of the rhizosphere of three biofuel crops at the KBS intensive site dataset January 2013
JGI QC impact on assembly, binning, phylogenomics, and functional analysis dataset January 2021
KBase Silver Case Study: Determining Media Formulation Requirements for Isolation of Microbiome Constituents dataset January 2021
Impact of BBDuk metagenomic read trimming and decontamination dataset January 2021
Jupyter Notebooks – a publishing format for reproducible computational workflows book January 2021
Bioinformatic Teaching Resources – For Educators, by Educators – Using KBase, a Free, User-Friendly, Open Source Platform journal October 2021
Chromosome-Scale Assembly of Winter Oilseed Rape Brassica napus journal April 2020
Effsize - a package for efficient effect size computation software November 2016
MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies journal January 2019

Similar Records

Impact of BBDuk metagenomic read trimming and decontamination
Dataset · Thu Dec 31 23:00:00 EST 2020 · OSTI ID:1779218

JGI QC impact on assembly, binning, phylogenomics, and functional analysis
Dataset · Thu Dec 31 23:00:00 EST 2020 · OSTI ID:1779219

HiC manuscript: bulk metagenomic data_MG_raw Upload
Dataset · Wed Jan 25 23:00:00 EST 2023 · OSTI ID:1922087

Related Subjects