skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Improved assemblies using a source-agnostic pipeline for MetaGenomic Assembly by Merging (MeGAMerge) of contigs

Abstract

Assembly of metagenomic samples is a very complex process, with algorithms designed to address sequencing platform-specific issues, (read length, data volume, and/or community complexity), while also faced with genomes that differ greatly in nucleotide compositional biases and in abundance. To address these issues, we have developed a post-assembly process: MetaGenomic Assembly by Merging (MeGAMerge). We compare this process to the performance of several assemblers, using both real, and in-silico generated samples of different community composition and complexity. MeGAMerge consistently outperforms individual assembly methods, producing larger contigs with an increased number of predicted genes, without replication of data. MeGAMerge contigs are supported by read mapping and contig alignment data, when using synthetically-derived and real metagenomic data, as well as by gene prediction analyses and similarity searches. Ultimately, MeGAMerge is a flexible method that generates improved metagenome assemblies, with the ability to accommodate upcoming sequencing platforms, as well as present and future assembly algorithms.

Authors:
 [1];  [1];  [1]
  1. Los Alamos National Lab. (LANL), Los Alamos, NM (United States); USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States)
Publication Date:
Research Org.:
Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
Sponsoring Org.:
USDOE Office of Science (SC); U.S. Department of Homeland Security
OSTI Identifier:
1259288
Grant/Contract Number:  
AC02-05CH11231; HSHQDC08X00790; B104153I; B084531I
Resource Type:
Journal Article: Accepted Manuscript
Journal Name:
Scientific Reports
Additional Journal Information:
Journal Volume: 4; Journal ID: ISSN 2045-2322
Publisher:
Nature Publishing Group
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; 97 MATHEMATICS AND COMPUTING; genome assembly algorithms; genomics; metagenomics; next-generation sequencing

Citation Formats

Scholz, Matthew, Lo, Chien -Chi, and Chain, Patrick S. G. Improved assemblies using a source-agnostic pipeline for MetaGenomic Assembly by Merging (MeGAMerge) of contigs. United States: N. p., 2014. Web. doi:10.1038/srep06480.
Scholz, Matthew, Lo, Chien -Chi, & Chain, Patrick S. G. Improved assemblies using a source-agnostic pipeline for MetaGenomic Assembly by Merging (MeGAMerge) of contigs. United States. https://doi.org/10.1038/srep06480
Scholz, Matthew, Lo, Chien -Chi, and Chain, Patrick S. G. 2014. "Improved assemblies using a source-agnostic pipeline for MetaGenomic Assembly by Merging (MeGAMerge) of contigs". United States. https://doi.org/10.1038/srep06480. https://www.osti.gov/servlets/purl/1259288.
@article{osti_1259288,
title = {Improved assemblies using a source-agnostic pipeline for MetaGenomic Assembly by Merging (MeGAMerge) of contigs},
author = {Scholz, Matthew and Lo, Chien -Chi and Chain, Patrick S. G.},
abstractNote = {Assembly of metagenomic samples is a very complex process, with algorithms designed to address sequencing platform-specific issues, (read length, data volume, and/or community complexity), while also faced with genomes that differ greatly in nucleotide compositional biases and in abundance. To address these issues, we have developed a post-assembly process: MetaGenomic Assembly by Merging (MeGAMerge). We compare this process to the performance of several assemblers, using both real, and in-silico generated samples of different community composition and complexity. MeGAMerge consistently outperforms individual assembly methods, producing larger contigs with an increased number of predicted genes, without replication of data. MeGAMerge contigs are supported by read mapping and contig alignment data, when using synthetically-derived and real metagenomic data, as well as by gene prediction analyses and similarity searches. Ultimately, MeGAMerge is a flexible method that generates improved metagenome assemblies, with the ability to accommodate upcoming sequencing platforms, as well as present and future assembly algorithms.},
doi = {10.1038/srep06480},
url = {https://www.osti.gov/biblio/1259288}, journal = {Scientific Reports},
issn = {2045-2322},
number = ,
volume = 4,
place = {United States},
year = {Wed Oct 01 00:00:00 EDT 2014},
month = {Wed Oct 01 00:00:00 EDT 2014}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 28 works
Citation information provided by
Web of Science

Figures / Tables:

Figure 1 Figure 1: MeGAMerge pipeline for metagenomes. This diagram provides an overview of the MeGAMerge process, including optional steps for trimming sequencing data and the inclusion of optional assemblers for Illumina reads. Long read or contig sets may include Sanger libraries, error-corrected PacBio reads (raw reads are likely to be toomore » error-prone to be merged), and any other source of contigs. Input sequences of size , 200 bp are removed from this method, but this default value can be changed. The MeGAMerge pipeline currently uses Newbler to assemble short contigs, and Minimus2 as the final assembly stage.« less

Save / Share:

Works referenced in this record:

Next generation sequencing and bioinformatic bottlenecks: the current state of metagenomic data analysis
journal, February 2012


Assembly algorithms for next-generation sequencing data
journal, June 2010


Assemblathon 1: A competitive assessment of de novo short read assembly methods
journal, September 2011


Scaling metagenome sequence assembly with probabilistic de Bruijn graphs
journal, July 2012


From genomics to metagenomics
journal, February 2012


Integrating genome assemblies with MAIA
journal, September 2010


Velvet: Algorithms for de novo short read assembly using de Bruijn graphs
journal, February 2008


SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler
journal, December 2012


Ray Meta: scalable de novo metagenome assembly and profiling
journal, January 2012


Metagenome, metatranscriptome and single-cell sequencing reveal microbial response to Deepwater Horizon oil spill
journal, June 2012


A novel metatranscriptomic approach to identify gene expression dynamics during extracellular electron transfer
journal, March 2013


Single-cell and metagenomic analyses indicate a fermentative and saccharolytic lifestyle for members of the OP9 lineage
journal, May 2013


Proteogenomic Analysis of a Thermophilic Bacterial Consortium Adapted to Deconstruct Switchgrass
journal, July 2013


De novo assembly of human genomes with massively parallel short read sequencing
journal, December 2009


Comparative genome assembly
journal, January 2004


Minimus: a fast, lightweight genome assembler
journal, January 2007


The Sequence Alignment/Map format and SAMtools
journal, June 2009


Aligning Short Sequencing Reads with Bowtie
journal, December 2010


Prodigal: prokaryotic gene recognition and translation initiation site identification
journal, March 2010


Gene and translation initiation site prediction in metagenomic sequences
journal, July 2012


Mesobacillus aurantius sp. nov., isolated from an orange-colored pond near a solar saltern
journal, January 2021


Works referencing / citing this record:

Integrated multi-omics of the human gut microbiome in a case study of familial type 1 diabetes
journal, October 2016


Improved metagenome assemblies and taxonomic binning using long-read circular consensus sequence data
journal, May 2016


Wetland Sediments Host Diverse Microbial Taxa Capable of Cycling Alcohols
journal, April 2019


InteMAP: Integrated metagenomic assembly pipeline for NGS short reads
journal, August 2015


ICoVeR – an interactive visualization tool for verification and refinement of metagenomic bins
journal, May 2017


Recovering complete and draft population genomes from metagenome datasets
journal, March 2016


Viral and metabolic controls on high rates of microbial sulfur and carbon cycling in wetland ecosystems
journal, August 2018


Overview of Virus Metagenomic Classification Methods and Their Biological Applications
journal, April 2018


InteMAP: Integrated metagenomic assembly pipeline for NGS short reads
journal, August 2015


ICoVeR – an interactive visualization tool for verification and refinement of metagenomic bins
journal, May 2017


Recovering complete and draft population genomes from metagenome datasets
journal, March 2016


Viral and metabolic controls on high rates of microbial sulfur and carbon cycling in wetland ecosystems
journal, August 2018


Figures/Tables have been extracted from DOE-funded journal article accepted manuscripts.