JGI QC impact on assembly, binning, phylogenomics, and functional analysis
Abstract
Background Investigators using metagenomic sequencing to study their microbiomes are often provided data that has been trimmed and decontaminated or do it themselves without knowing the effect these procedures can have on their downstream analyses. Here we evaluated the impact that JGI trimming and decontamination procedures had on assembly and binning metrics, placement of metagenome assembled genomes into species trees, and functional profiles of metagenome-assembled genomes (MAGs) extracted from twenty three complex rhizosphere metagenomes. We also investigated how more aggressive trimming impacts these binning metrics. Results We found that JGI trimmed and decontamination of input reads had some significant impacts in assembly and binning metrics compared to raw reads, and that differences in placement of MAGs in species trees increased with decreasing completeness and contamination thresholds. More aggressive trimming beyond those used by JGI were found to reduce MAG counts. Conclusions Mild trimming and decontamination of metagenomics reads prior to assembly can change an investigator’s answer to the questions, “Who is there and what are they doing? However, mild trimming and decontamination of metagenomic reads with high quality scores is recommended for those who elect to do so.
- Authors:
-
- North Carolina State Univ., Raleigh, NC (United States)
- Publication Date:
- Research Org.:
- North Carolina State University, Raleigh, NC (United States)
- Sponsoring Org.:
- USDOE Office of Science (SC), Biological and Environmental Research (BER)
- Subject:
- 59 BASIC BIOLOGICAL SCIENCES
- Keywords:
- metagenomics, decontamination, assembly, binning, phylogenomics, functional analysis
- OSTI Identifier:
- 1779219
- DOI:
- https://doi.org/10.25982/62657.1515/1779219
Citation Formats
Whitham, Jason. JGI QC impact on assembly, binning, phylogenomics, and functional analysis. United States: N. p., 2021.
Web. doi:10.25982/62657.1515/1779219.
Whitham, Jason. JGI QC impact on assembly, binning, phylogenomics, and functional analysis. United States. doi:https://doi.org/10.25982/62657.1515/1779219
Whitham, Jason. 2021.
"JGI QC impact on assembly, binning, phylogenomics, and functional analysis". United States. doi:https://doi.org/10.25982/62657.1515/1779219. https://www.osti.gov/servlets/purl/1779219. Pub date:Fri Jan 01 00:00:00 EST 2021
@article{osti_1779219,
title = {JGI QC impact on assembly, binning, phylogenomics, and functional analysis},
author = {Whitham, Jason},
abstractNote = {Background Investigators using metagenomic sequencing to study their microbiomes are often provided data that has been trimmed and decontaminated or do it themselves without knowing the effect these procedures can have on their downstream analyses. Here we evaluated the impact that JGI trimming and decontamination procedures had on assembly and binning metrics, placement of metagenome assembled genomes into species trees, and functional profiles of metagenome-assembled genomes (MAGs) extracted from twenty three complex rhizosphere metagenomes. We also investigated how more aggressive trimming impacts these binning metrics. Results We found that JGI trimmed and decontamination of input reads had some significant impacts in assembly and binning metrics compared to raw reads, and that differences in placement of MAGs in species trees increased with decreasing completeness and contamination thresholds. More aggressive trimming beyond those used by JGI were found to reduce MAG counts. Conclusions Mild trimming and decontamination of metagenomics reads prior to assembly can change an investigator’s answer to the questions, “Who is there and what are they doing? However, mild trimming and decontamination of metagenomic reads with high quality scores is recommended for those who elect to do so.},
doi = {10.25982/62657.1515/1779219},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2021},
month = {1}
}
Works referenced in this record:
QUAST: quality assessment tool for genome assemblies
journal, February 2013
- Gurevich, Alexey; Saveliev, Vladislav; Vyahhi, Nikolay
- Bioinformatics, Vol. 29, Issue 8
Metagenomic analysis of the rhizosphere of three biofuel crops at the KBS intensive site
dataset, January 2013
- Tiedje, James
- DOE Joint Genome Institute
CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes
journal, May 2015
- Parks, Donovan H.; Imelfort, Michael; Skennerton, Connor T.
- Genome Research, Vol. 25, Issue 7
Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea
journal, August 2017
- Bowers, Robert M.; Kyrpides, Nikos C.; Stepanauskas, Ramunas
- Nature Biotechnology, Vol. 35, Issue 8
The IMG/M data management and analysis system v.6.0: new tools and advanced capabilities
journal, October 2020
- Chen, I-Min A.; Chu, Ken; Palaniappan, Krishnaveni
- Nucleic Acids Research, Vol. 49, Issue D1
Trace gas oxidizers are widespread and active members of soil microbial communities
journal, January 2021
- Bay, Sean K.; Dong, Xiyang; Bradley, James A.
- Nature Microbiology, Vol. 6, Issue 2
Evaluating metagenomics tools for genome binning with real metagenomic datasets and CAMI datasets
journal, July 2020
- Yue, Yi; Huang, Hao; Qi, Zhao
- BMC Bioinformatics, Vol. 21, Issue 1
Identification, variation and transcription of pneumococcal repeat sequences
journal, February 2011
- Croucher, Nicholas J.; Vernikos, Georgios S.; Parkhill, Julian
- BMC Genomics, Vol. 12, Issue 1
FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments
journal, March 2010
- Price, Morgan N.; Dehal, Paramvir S.; Arkin, Adam P.
- PLoS ONE, Vol. 5, Issue 3
MetaQUAST: evaluation of metagenome assemblies
journal, November 2015
- Mikheenko, Alla; Saveliev, Vladislav; Gurevich, Alexey
- Bioinformatics, Vol. 32, Issue 7
Prodigal: prokaryotic gene recognition and translation initiation site identification
journal, March 2010
- Hyatt, Doug; Chen, Gwo-Liang; LoCascio, Philip F.
- BMC Bioinformatics, Vol. 11, Issue 1
MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities
journal, January 2015
- Kang, Dongwan D.; Froula, Jeff; Egan, Rob
- PeerJ, Vol. 3
TIGRFAMs: a protein family resource for the functional identification of proteins
journal, January 2001
- Haft, D. H.
- Nucleic Acids Research, Vol. 29, Issue 1
MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm
journal, August 2014
- Wu, Yu-Wei; Tang, Yung-Hsu; Tringe, Susannah G.
- Microbiome, Vol. 2, Issue 1
Microbial Community Analysis with Ribosomal Gene Fragments from Shotgun Metagenomes
journal, October 2015
- Guo, Jiarong; Cole, James R.; Zhang, Qingpeng
- Applied and Environmental Microbiology, Vol. 82, Issue 1
COG database update: focus on microbial diversity, model organisms, and widespread pathogens
journal, November 2020
- Galperin, Michael Y.; Wolf, Yuri I.; Makarova, Kira S.
- Nucleic Acids Research, Vol. 49, Issue D1
The RAST Server: Rapid Annotations using Subsystems Technology
journal, January 2008
- Aziz, Ramy K.; Bartels, Daniela; Best, Aaron A.
- BMC Genomics, Vol. 9, Issue 1, Article No. 75
IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth
journal, April 2012
- Peng, Y.; Leung, H. C. M.; Yiu, S. M.
- Bioinformatics, Vol. 28, Issue 11
HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks
journal, January 2018
- Azad, Ariful; Pavlopoulos, Georgios A.; Ouzounis, Christos A.
- Nucleic Acids Research, Vol. 46, Issue 6
KBase: The United States Department of Energy Systems Biology Knowledgebase
journal, July 2018
- Arkin, Adam P.; Cottingham, Robert W.; Henry, Christopher S.
- Nature Biotechnology, Vol. 36, Issue 7
Icarus: visualizer for de novo assembly evaluation
journal, July 2016
- Mikheenko, Alla; Valin, Gleb; Prjibelski, Andrey
- Bioinformatics, Vol. 32, Issue 21
Accelerated Profile HMM Searches
journal, October 2011
- Eddy, Sean R.
- PLoS Computational Biology, Vol. 7, Issue 10
Using SPAdes De Novo Assembler
journal, June 2020
- Prjibelski, Andrey; Antipov, Dmitry; Meleshko, Dmitry
- Current Protocols in Bioinformatics, Vol. 70, Issue 1
Pfam: The protein families database in 2021
journal, October 2020
- Mistry, Jaina; Chuguransky, Sara; Williams, Lowri
- Nucleic Acids Research, Vol. 49, Issue D1
BLAST+: architecture and applications
journal, January 2009
- Camacho, Christiam; Coulouris, George; Avagyan, Vahram
- BMC Bioinformatics, Vol. 10, Issue 1
KBase Silver Case Study: Determining Media Formulation Requirements for Isolation of Microbiome Constituents
dataset, January 2021
- Whitham, Jason
- North Carolina State University
MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph
journal, January 2015
- Li, Dinghua; Liu, Chi-Man; Luo, Ruibang
- Bioinformatics, Vol. 31, Issue 10
Jupyter Notebooks – a publishing format for reproducible computational workflows
book, January 2021
- Kluyver, Thomas; Ragan-Kelley, Benjamin; Perez, Fernando
- Positioning and Power in Academic Publishing: Players, Agents and Agendas
The Importance of Accounting for Correlated Observations
journal, September 2010
- Sainani, Kristin
- PM&R, Vol. 2, Issue 9
ETE 3: Reconstruction, Analysis, and Visualization of Phylogenomic Data
journal, February 2016
- Huerta-Cepas, Jaime; Serra, François; Bork, Peer
- Molecular Biology and Evolution, Vol. 33, Issue 6
Data Analysis Using Regression and Multilevel/Hierarchical Models
book, January 2006
- Gelman, Andrew; Hill, Jennifer
MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities
journal, January 2015
- Kang, Dongwan D.; Froula, Jeff; Egan, Rob
- PeerJ, Vol. 3
RASTtk: A modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes
journal, February 2015
- Brettin, Thomas; Davis, James J.; Disz, Terry
- Scientific Reports, Vol. 5, Issue 1
MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets
journal, October 2015
- Wu, Yu-Wei; Simmons, Blake A.; Singer, Steven W.
- Bioinformatics, Vol. 32, Issue 4
Genomes OnLine Database (GOLD) v.8: overview and updates
journal, November 2020
- Mukherjee, Supratim; Stamatis, Dimitri; Bertsch, Jon
- Nucleic Acids Research, Vol. 49, Issue D1
FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments
journal, March 2010
- Price, Morgan N.; Dehal, Paramvir S.; Arkin, Adam P.
- PLoS ONE, Vol. 5, Issue 3