DOE Data Explorer title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: JGI QC impact on assembly, binning, phylogenomics, and functional analysis

Abstract

Background Investigators using metagenomic sequencing to study their microbiomes are often provided data that has been trimmed and decontaminated or do it themselves without knowing the effect these procedures can have on their downstream analyses. Here we evaluated the impact that JGI trimming and decontamination procedures had on assembly and binning metrics, placement of metagenome assembled genomes into species trees, and functional profiles of metagenome-assembled genomes (MAGs) extracted from twenty three complex rhizosphere metagenomes. We also investigated how more aggressive trimming impacts these binning metrics. Results We found that JGI trimmed and decontamination of input reads had some significant impacts in assembly and binning metrics compared to raw reads, and that differences in placement of MAGs in species trees increased with decreasing completeness and contamination thresholds. More aggressive trimming beyond those used by JGI were found to reduce MAG counts. Conclusions Mild trimming and decontamination of metagenomics reads prior to assembly can change an investigator’s answer to the questions, “Who is there and what are they doing? However, mild trimming and decontamination of metagenomic reads with high quality scores is recommended for those who elect to do so.

Authors:
ORCiD logo
  1. North Carolina State Univ., Raleigh, NC (United States)
Publication Date:
Research Org.:
North Carolina State Univ., Raleigh, NC (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Biological and Environmental Research (BER)
Subject:
59 BASIC BIOLOGICAL SCIENCES
Keywords:
metagenomics, decontamination, assembly, binning, phylogenomics, functional analysis
OSTI Identifier:
1779219
DOI:
https://doi.org/10.25982/62657.1515/1779219

Citation Formats

Whitham, Jason. JGI QC impact on assembly, binning, phylogenomics, and functional analysis. United States: N. p., 2021. Web. doi:10.25982/62657.1515/1779219.
Whitham, Jason. JGI QC impact on assembly, binning, phylogenomics, and functional analysis. United States. doi:https://doi.org/10.25982/62657.1515/1779219
Whitham, Jason. 2021. "JGI QC impact on assembly, binning, phylogenomics, and functional analysis". United States. doi:https://doi.org/10.25982/62657.1515/1779219. https://www.osti.gov/servlets/purl/1779219. Pub date:Fri Jan 01 00:00:00 EST 2021
@article{osti_1779219,
title = {JGI QC impact on assembly, binning, phylogenomics, and functional analysis},
author = {Whitham, Jason},
abstractNote = {Background Investigators using metagenomic sequencing to study their microbiomes are often provided data that has been trimmed and decontaminated or do it themselves without knowing the effect these procedures can have on their downstream analyses. Here we evaluated the impact that JGI trimming and decontamination procedures had on assembly and binning metrics, placement of metagenome assembled genomes into species trees, and functional profiles of metagenome-assembled genomes (MAGs) extracted from twenty three complex rhizosphere metagenomes. We also investigated how more aggressive trimming impacts these binning metrics. Results We found that JGI trimmed and decontamination of input reads had some significant impacts in assembly and binning metrics compared to raw reads, and that differences in placement of MAGs in species trees increased with decreasing completeness and contamination thresholds. More aggressive trimming beyond those used by JGI were found to reduce MAG counts. Conclusions Mild trimming and decontamination of metagenomics reads prior to assembly can change an investigator’s answer to the questions, “Who is there and what are they doing? However, mild trimming and decontamination of metagenomic reads with high quality scores is recommended for those who elect to do so.},
doi = {10.25982/62657.1515/1779219},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2021},
month = {1}
}

Works referenced in this record:

QUAST: quality assessment tool for genome assemblies
journal, February 2013


CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes
journal, May 2015


Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea
journal, August 2017


The IMG/M data management and analysis system v.6.0: new tools and advanced capabilities
journal, October 2020


Trace gas oxidizers are widespread and active members of soil microbial communities
journal, January 2021


Identification, variation and transcription of pneumococcal repeat sequences
journal, February 2011


FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments
journal, March 2010


MetaQUAST: evaluation of metagenome assemblies
journal, November 2015


Prodigal: prokaryotic gene recognition and translation initiation site identification
journal, March 2010


TIGRFAMs: a protein family resource for the functional identification of proteins
journal, January 2001


Microbial Community Analysis with Ribosomal Gene Fragments from Shotgun Metagenomes
journal, October 2015


COG database update: focus on microbial diversity, model organisms, and widespread pathogens
journal, November 2020


The RAST Server: Rapid Annotations using Subsystems Technology
journal, January 2008


HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks
journal, January 2018


KBase: The United States Department of Energy Systems Biology Knowledgebase
journal, July 2018


Icarus: visualizer for de novo assembly evaluation
journal, July 2016


Accelerated Profile HMM Searches
journal, October 2011


Using SPAdes De Novo Assembler
journal, June 2020


Pfam: The protein families database in 2021
journal, October 2020


BLAST+: architecture and applications
journal, January 2009


The Importance of Accounting for Correlated Observations
journal, September 2010


ETE 3: Reconstruction, Analysis, and Visualization of Phylogenomic Data
journal, February 2016


MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets
journal, October 2015


Genomes OnLine Database (GOLD) v.8: overview and updates
journal, November 2020


FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments
journal, March 2010