DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Xander: employing a novel method for efficient gene-targeted metagenomic assembly

Abstract

Here, metagenomics can provide important insight into microbial communities. However, assembling metagenomic datasets has proven to be computationally challenging. Current methods often assemble only fragmented partial genes. We present a novel method for targeting assembly of specific protein-coding genes. This method combines a de Bruijn graph, as used in standard assembly approaches, and a protein profile hidden Markov model (HMM) for the gene of interest, as used in standard annotation approaches. These are used to create a novel combined weighted assembly graph. Xander performs both assembly and annotation concomitantly using information incorporated in this graph. We demonstrate the utility of this approach by assembling contigs for one phylogenetic marker gene and for two functional marker genes, first on Human Microbiome Project (HMP)-defined community Illumina data and then on 21 rhizosphere soil metagenomic datasets from three different crops totaling over 800 Gbp of unassembled data. We compared our method to a recently published bulk metagenome assembly method and a recently published gene-targeted assembler and found our method produced more, longer, and higher quality gene sequences. In conclusion, xander combines gene assignment with the rapid assembly of full-length or near full-length functional genes from metagenomic data without requiring bulk assembly or post-processingmore » to find genes of interest. HMMs used for assembly can be tailored to the targeted genes, allowing flexibility to improve annotation over generic annotation pipelines.« less

Authors:
; ; ; ; ; ;
Publication Date:
Research Org.:
Michigan State Univ., East Lansing, MI (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1503062
Alternate Identifier(s):
OSTI ID: 1454520
Grant/Contract Number:  
BER DE-FC02-07ER64494; FG02-99ER62848; SC0010715
Resource Type:
Published Article
Journal Name:
Microbiome
Additional Journal Information:
Journal Name: Microbiome Journal Volume: 3 Journal Issue: 1; Journal ID: ISSN 2049-2618
Publisher:
Springer Science + Business Media
Country of Publication:
United Kingdom
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; Metagenomics; Assembly; Functional gene; HMM; Nitrogen cycle; nifH; nirK; Biofuel crop

Citation Formats

Wang, Qiong, Fish, Jordan A., Gilman, Mariah, Sun, Yanni, Brown, C. Titus, Tiedje, James M., and Cole, James R. Xander: employing a novel method for efficient gene-targeted metagenomic assembly. United Kingdom: N. p., 2015. Web. doi:10.1186/s40168-015-0093-6.
Wang, Qiong, Fish, Jordan A., Gilman, Mariah, Sun, Yanni, Brown, C. Titus, Tiedje, James M., & Cole, James R. Xander: employing a novel method for efficient gene-targeted metagenomic assembly. United Kingdom. https://doi.org/10.1186/s40168-015-0093-6
Wang, Qiong, Fish, Jordan A., Gilman, Mariah, Sun, Yanni, Brown, C. Titus, Tiedje, James M., and Cole, James R. Wed . "Xander: employing a novel method for efficient gene-targeted metagenomic assembly". United Kingdom. https://doi.org/10.1186/s40168-015-0093-6.
@article{osti_1503062,
title = {Xander: employing a novel method for efficient gene-targeted metagenomic assembly},
author = {Wang, Qiong and Fish, Jordan A. and Gilman, Mariah and Sun, Yanni and Brown, C. Titus and Tiedje, James M. and Cole, James R.},
abstractNote = {Here, metagenomics can provide important insight into microbial communities. However, assembling metagenomic datasets has proven to be computationally challenging. Current methods often assemble only fragmented partial genes. We present a novel method for targeting assembly of specific protein-coding genes. This method combines a de Bruijn graph, as used in standard assembly approaches, and a protein profile hidden Markov model (HMM) for the gene of interest, as used in standard annotation approaches. These are used to create a novel combined weighted assembly graph. Xander performs both assembly and annotation concomitantly using information incorporated in this graph. We demonstrate the utility of this approach by assembling contigs for one phylogenetic marker gene and for two functional marker genes, first on Human Microbiome Project (HMP)-defined community Illumina data and then on 21 rhizosphere soil metagenomic datasets from three different crops totaling over 800 Gbp of unassembled data. We compared our method to a recently published bulk metagenome assembly method and a recently published gene-targeted assembler and found our method produced more, longer, and higher quality gene sequences. In conclusion, xander combines gene assignment with the rapid assembly of full-length or near full-length functional genes from metagenomic data without requiring bulk assembly or post-processing to find genes of interest. HMMs used for assembly can be tailored to the targeted genes, allowing flexibility to improve annotation over generic annotation pipelines.},
doi = {10.1186/s40168-015-0093-6},
journal = {Microbiome},
number = 1,
volume = 3,
place = {United Kingdom},
year = {Wed Aug 05 00:00:00 EDT 2015},
month = {Wed Aug 05 00:00:00 EDT 2015}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record
https://doi.org/10.1186/s40168-015-0093-6

Citation Metrics:
Cited by: 71 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

UCHIME improves sensitivity and speed of chimera detection
journal, June 2011


Fast and accurate short read alignment with Burrows-Wheeler transform
journal, May 2009


Repetitive DNA and next-generation sequencing: computational challenges and solutions
journal, November 2011

  • Treangen, Todd J.; Salzberg, Steven L.
  • Nature Reviews Genetics, Vol. 13, Issue 1
  • DOI: 10.1038/nrg3117

Impact of different bioenergy crops on N-cycling bacterial and archaeal communities in soil: Impact of bioenergy crops on soil N-cycling archaea and bacteria
journal, August 2012


A human gut microbial gene catalogue established by metagenomic sequencing
journal, March 2010

  • Qin, Junjie; Li, Ruiqiang; Raes, Jeroen
  • Nature, Vol. 464, Issue 7285
  • DOI: 10.1038/nature08821

MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph
journal, January 2015


Using the miraEST Assembler for Reliable and Automated mRNA Transcript Assembly and SNP Detection in Sequenced ESTs
journal, May 2004

  • Chevreux, Bastien; Pfisterer, Thomas; Drescher, Bernd
  • Genome Research, Vol. 14, Issue 6, p. 1147-1159
  • DOI: 10.1101/gr.1917404

EMIRGE: reconstruction of full-length ribosomal genes from microbial community short read sequencing data
journal, May 2011

  • Miller, Christopher S.; Baker, Brett J.; Thomas, Brian C.
  • Genome Biology, Vol. 12, Issue 5
  • DOI: 10.1186/gb-2011-12-5-r44

MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads
journal, July 2012

  • Namiki, Toshiaki; Hachiya, Tsuyoshi; Tanaka, Hideaki
  • Nucleic Acids Research, Vol. 40, Issue 20
  • DOI: 10.1093/nar/gks678

Finding the K Shortest Loopless Paths in a Network
journal, July 1971


FLASH: fast length adjustment of short reads to improve genome assemblies
journal, September 2011


A Scalable and Accurate Targeted Gene Assembly Tool (SAT-Assembler) for Next-Generation Sequencing Data
journal, August 2014


The incidence of nirS and nirK and their genetic heterogeneity in cultivated denitrifiers
journal, November 2006


Velvet: Algorithms for de novo short read assembly using de Bruijn graphs
journal, February 2008


Comparative genome assembly
journal, January 2004


The future is now: single-cell genomics of bacteria and archaea
journal, May 2013


Tackling soil diversity with the assembly of large, complex metagenomes
journal, March 2014

  • Howe, Adina Chuang; Jansson, Janet K.; Malfatti, Stephanie A.
  • Proceedings of the National Academy of Sciences, Vol. 111, Issue 13
  • DOI: 10.1073/pnas.1402564111

What is a hidden Markov model?
journal, October 2004


Space/time trade-offs in hash coding with allowable errors
journal, July 1970


How to apply de Bruijn graphs to genome assembly
journal, November 2011

  • Compeau, Phillip E. C.; Pevzner, Pavel A.; Tesler, Glenn
  • Nature Biotechnology, Vol. 29, Issue 11
  • DOI: 10.1038/nbt.2023

Environmental Genome Shotgun Sequencing of the Sargasso Sea
journal, April 2004


An algorithm for approximate membership checking with application to password security
journal, May 1994


A Sensitive and Accurate protein domain cLassification Tool (SALT) for short reads
journal, June 2013


Scaling metagenome sequence assembly with probabilistic de Bruijn graphs
journal, July 2012

  • Pell, J.; Hintze, A.; Canino-Koning, R.
  • Proceedings of the National Academy of Sciences, Vol. 109, Issue 33
  • DOI: 10.1073/pnas.1121464109

Sequence Homology Search Based on Database Indexing Using the Profile Hidden Markov Model
conference, October 2006

  • Xue, Qiang; Cole, James; Pramanik, Sakti
  • Sixth IEEE Symposium on BioInformatics and BioEngineering (BIBE'06)
  • DOI: 10.1109/BIBE.2006.253326

A phylogeny-driven genomic encyclopaedia of Bacteria and Archaea
journal, December 2009

  • Wu, Dongying; Hugenholtz, Philip; Mavromatis, Konstantinos
  • Nature, Vol. 462, Issue 7276
  • DOI: 10.1038/nature08656

A Formal Basis for the Heuristic Determination of Minimum Cost Paths
journal, January 1968

  • Hart, Peter; Nilsson, Nils; Raphael, Bertram
  • IEEE Transactions on Systems Science and Cybernetics, Vol. 4, Issue 2
  • DOI: 10.1109/TSSC.1968.300136

FunGene: the functional gene pipeline and repository
journal, January 2013


FragGeneScan: predicting genes in short and error-prone reads
journal, August 2010

  • Rho, Mina; Tang, Haixu; Ye, Yuzhen
  • Nucleic Acids Research, Vol. 38, Issue 20
  • DOI: 10.1093/nar/gkq747

Changes in N-Transforming Archaea and Bacteria in Soil during the Establishment of Bioenergy Crops
journal, September 2011


Works referencing / citing this record:

A multi-source domain annotation pipeline for quantitative metagenomic and metatranscriptomic functional profiling
journal, August 2018


MEGAN Community Edition - Interactive Exploration and Analysis of Large-Scale Microbiome Sequencing Data
journal, June 2016


MegaGTA: a sensitive and accurate metagenomic gene-targeted assembler using iterative de Bruijn graphs
journal, October 2017


Challenges and Approaches in Microbiome Research: From Fundamental to Applied
journal, August 2018


Colonic Butyrate-Producing Communities in Humans: an Overview Using Omics Data
journal, December 2017


Virulence factor activity relationships (VFARs): a bioinformatics perspective
journal, January 2017

  • Waseem, Hassan; Williams, Maggie R.; Stedtfeld, Tiffany
  • Environmental Science: Processes & Impacts, Vol. 19, Issue 3
  • DOI: 10.1039/c6em00689b

Metagenomic Insights into the Degradation of Resistant Starch by Human Gut Microbiota
journal, September 2018

  • Vital, Marius; Howe, Adina; Bergeron, Nathalie
  • Applied and Environmental Microbiology, Vol. 84, Issue 23
  • DOI: 10.1128/aem.01562-18

Metagenome and Metatranscriptome Analyses Using Protein Family Profiles
journal, July 2016


Community structure explains antibiotic resistance gene dynamics over a temperature gradient in soil
journal, February 2018


Distribution and Diversity of Rhodopsin-Producing Microbes in the Chesapeake Bay
journal, April 2018

  • Maresca, Julia A.; Miller, Kelsey J.; Keffer, Jessica L.
  • Applied and Environmental Microbiology, Vol. 84, Issue 13
  • DOI: 10.1128/aem.00137-18

A multi-source domain annotation pipeline for quantitative metagenomic and metatranscriptomic functional profiling
journal, August 2018


New approaches for metagenome assembly with short reads
journal, February 2019

  • Ayling, Martin; Clark, Matthew D.; Leggett, Richard M.
  • Briefings in Bioinformatics, Vol. 21, Issue 2
  • DOI: 10.1093/bib/bbz020

Contrasting Pathways for Anaerobic Methane Oxidation in Gulf of Mexico Cold Seep Sediments
journal, February 2019


A Review of Bioinformatics Tools for Bio-Prospecting from Metagenomic Sequence Data
journal, March 2017

  • Roumpeka, Despoina D.; Wallace, R. John; Escalettes, Frank
  • Frontiers in Genetics, Vol. 8
  • DOI: 10.3389/fgene.2017.00023

Uncovering the trimethylamine-producing bacteria of the human gut microbiota
journal, May 2017


Fast and simple protein-alignment-guided assembly of orthologous gene families from microbiome sequencing reads
journal, January 2017


A global survey of arsenic-related genes in soil microbiomes
journal, May 2019


Computational profiling of the gut–brain axis: microflora dysbiosis insights to neurological disorders
journal, November 2017

  • Dovrolis, Nikolas; Kolios, George; Spyrou, George M.
  • Briefings in Bioinformatics, Vol. 20, Issue 3
  • DOI: 10.1093/bib/bbx154

A global survey of arsenic-related genes in soil microbiomes
journal, May 2019


Cellulosic biofuel contributions to a sustainable energy future: Choices and outcomes
journal, June 2017

  • Robertson, G. Philip; Hamilton, Stephen K.; Barham, Bradford L.
  • Science, Vol. 356, Issue 6345
  • DOI: 10.1126/science.aal2324

Machine learning meets genome assembly
journal, August 2018

  • Padovani de Souza, Kleber; Setubal, João Carlos; Ponce de Leon F. de Carvalho, André Carlos
  • Briefings in Bioinformatics, Vol. 20, Issue 6
  • DOI: 10.1093/bib/bby072

Microbial Community Responses to Increased Water and Organic Matter in the Arid Soils of the McMurdo Dry Valleys, Antarctica
journal, July 2016

  • Buelow, Heather N.; Winter, Ara S.; Van Horn, David J.
  • Frontiers in Microbiology, Vol. 7
  • DOI: 10.3389/fmicb.2016.01040

Distinct temporal diversity profiles for nitrogen cycling genes in a hyporheic microbiome
journal, January 2020