Xander: employing a novel method for efficient gene-targeted metagenomic assembly
Abstract
Here, metagenomics can provide important insight into microbial communities. However, assembling metagenomic datasets has proven to be computationally challenging. Current methods often assemble only fragmented partial genes. We present a novel method for targeting assembly of specific protein-coding genes. This method combines a de Bruijn graph, as used in standard assembly approaches, and a protein profile hidden Markov model (HMM) for the gene of interest, as used in standard annotation approaches. These are used to create a novel combined weighted assembly graph. Xander performs both assembly and annotation concomitantly using information incorporated in this graph. We demonstrate the utility of this approach by assembling contigs for one phylogenetic marker gene and for two functional marker genes, first on Human Microbiome Project (HMP)-defined community Illumina data and then on 21 rhizosphere soil metagenomic datasets from three different crops totaling over 800 Gbp of unassembled data. We compared our method to a recently published bulk metagenome assembly method and a recently published gene-targeted assembler and found our method produced more, longer, and higher quality gene sequences. In conclusion, xander combines gene assignment with the rapid assembly of full-length or near full-length functional genes from metagenomic data without requiring bulk assembly or post-processingmore »
- Authors:
- Publication Date:
- Research Org.:
- Michigan State Univ., East Lansing, MI (United States)
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 1503062
- Alternate Identifier(s):
- OSTI ID: 1454520
- Grant/Contract Number:
- BER DE-FC02-07ER64494; FG02-99ER62848; SC0010715
- Resource Type:
- Published Article
- Journal Name:
- Microbiome
- Additional Journal Information:
- Journal Name: Microbiome Journal Volume: 3 Journal Issue: 1; Journal ID: ISSN 2049-2618
- Publisher:
- Springer Science + Business Media
- Country of Publication:
- United Kingdom
- Language:
- English
- Subject:
- 59 BASIC BIOLOGICAL SCIENCES; Metagenomics; Assembly; Functional gene; HMM; Nitrogen cycle; nifH; nirK; Biofuel crop
Citation Formats
Wang, Qiong, Fish, Jordan A., Gilman, Mariah, Sun, Yanni, Brown, C. Titus, Tiedje, James M., and Cole, James R. Xander: employing a novel method for efficient gene-targeted metagenomic assembly. United Kingdom: N. p., 2015.
Web. doi:10.1186/s40168-015-0093-6.
Wang, Qiong, Fish, Jordan A., Gilman, Mariah, Sun, Yanni, Brown, C. Titus, Tiedje, James M., & Cole, James R. Xander: employing a novel method for efficient gene-targeted metagenomic assembly. United Kingdom. https://doi.org/10.1186/s40168-015-0093-6
Wang, Qiong, Fish, Jordan A., Gilman, Mariah, Sun, Yanni, Brown, C. Titus, Tiedje, James M., and Cole, James R. Wed .
"Xander: employing a novel method for efficient gene-targeted metagenomic assembly". United Kingdom. https://doi.org/10.1186/s40168-015-0093-6.
@article{osti_1503062,
title = {Xander: employing a novel method for efficient gene-targeted metagenomic assembly},
author = {Wang, Qiong and Fish, Jordan A. and Gilman, Mariah and Sun, Yanni and Brown, C. Titus and Tiedje, James M. and Cole, James R.},
abstractNote = {Here, metagenomics can provide important insight into microbial communities. However, assembling metagenomic datasets has proven to be computationally challenging. Current methods often assemble only fragmented partial genes. We present a novel method for targeting assembly of specific protein-coding genes. This method combines a de Bruijn graph, as used in standard assembly approaches, and a protein profile hidden Markov model (HMM) for the gene of interest, as used in standard annotation approaches. These are used to create a novel combined weighted assembly graph. Xander performs both assembly and annotation concomitantly using information incorporated in this graph. We demonstrate the utility of this approach by assembling contigs for one phylogenetic marker gene and for two functional marker genes, first on Human Microbiome Project (HMP)-defined community Illumina data and then on 21 rhizosphere soil metagenomic datasets from three different crops totaling over 800 Gbp of unassembled data. We compared our method to a recently published bulk metagenome assembly method and a recently published gene-targeted assembler and found our method produced more, longer, and higher quality gene sequences. In conclusion, xander combines gene assignment with the rapid assembly of full-length or near full-length functional genes from metagenomic data without requiring bulk assembly or post-processing to find genes of interest. HMMs used for assembly can be tailored to the targeted genes, allowing flexibility to improve annotation over generic annotation pipelines.},
doi = {10.1186/s40168-015-0093-6},
journal = {Microbiome},
number = 1,
volume = 3,
place = {United Kingdom},
year = {Wed Aug 05 00:00:00 EDT 2015},
month = {Wed Aug 05 00:00:00 EDT 2015}
}
https://doi.org/10.1186/s40168-015-0093-6
Web of Science
Works referenced in this record:
UCHIME improves sensitivity and speed of chimera detection
journal, June 2011
- Edgar, Robert C.; Haas, Brian J.; Clemente, Jose C.
- Bioinformatics, Vol. 27, Issue 16
Fast and accurate short read alignment with Burrows-Wheeler transform
journal, May 2009
- Li, H.; Durbin, R.
- Bioinformatics, Vol. 25, Issue 14
Repetitive DNA and next-generation sequencing: computational challenges and solutions
journal, November 2011
- Treangen, Todd J.; Salzberg, Steven L.
- Nature Reviews Genetics, Vol. 13, Issue 1
Impact of different bioenergy crops on N-cycling bacterial and archaeal communities in soil: Impact of bioenergy crops on soil N-cycling archaea and bacteria
journal, August 2012
- Mao, Yuejian; Yannarell, Anthony C.; Davis, Sarah C.
- Environmental Microbiology, Vol. 15, Issue 3
A human gut microbial gene catalogue established by metagenomic sequencing
journal, March 2010
- Qin, Junjie; Li, Ruiqiang; Raes, Jeroen
- Nature, Vol. 464, Issue 7285
MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph
journal, January 2015
- Li, Dinghua; Liu, Chi-Man; Luo, Ruibang
- Bioinformatics, Vol. 31, Issue 10
Using the miraEST Assembler for Reliable and Automated mRNA Transcript Assembly and SNP Detection in Sequenced ESTs
journal, May 2004
- Chevreux, Bastien; Pfisterer, Thomas; Drescher, Bernd
- Genome Research, Vol. 14, Issue 6, p. 1147-1159
EMIRGE: reconstruction of full-length ribosomal genes from microbial community short read sequencing data
journal, May 2011
- Miller, Christopher S.; Baker, Brett J.; Thomas, Brian C.
- Genome Biology, Vol. 12, Issue 5
MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads
journal, July 2012
- Namiki, Toshiaki; Hachiya, Tsuyoshi; Tanaka, Hideaki
- Nucleic Acids Research, Vol. 40, Issue 20
Finding the K Shortest Loopless Paths in a Network
journal, July 1971
- Yen, Jin Y.
- Management Science, Vol. 17, Issue 11
FLASH: fast length adjustment of short reads to improve genome assemblies
journal, September 2011
- Magoc, T.; Salzberg, S. L.
- Bioinformatics, Vol. 27, Issue 21
A Scalable and Accurate Targeted Gene Assembly Tool (SAT-Assembler) for Next-Generation Sequencing Data
journal, August 2014
- Zhang, Yuan; Sun, Yanni; Cole, James R.
- PLoS Computational Biology, Vol. 10, Issue 8
The incidence of nirS and nirK and their genetic heterogeneity in cultivated denitrifiers
journal, November 2006
- Heylen, Kim; Gevers, Dirk; Vanparys, Bram
- Environmental Microbiology, Vol. 8, Issue 11
Velvet: Algorithms for de novo short read assembly using de Bruijn graphs
journal, February 2008
- Zerbino, D. R.; Birney, E.
- Genome Research, Vol. 18, Issue 5
Comparative genome assembly
journal, January 2004
- Pop, M.
- Briefings in Bioinformatics, Vol. 5, Issue 3
A Procedure for Computing the K Best Solutions to Discrete Optimization Problems and Its Application to the Shortest Path Problem
journal, March 1972
- Lawler, Eugene L.
- Management Science, Vol. 18, Issue 7
The future is now: single-cell genomics of bacteria and archaea
journal, May 2013
- Blainey, Paul C.
- FEMS Microbiology Reviews, Vol. 37, Issue 3
Tackling soil diversity with the assembly of large, complex metagenomes
journal, March 2014
- Howe, Adina Chuang; Jansson, Janet K.; Malfatti, Stephanie A.
- Proceedings of the National Academy of Sciences, Vol. 111, Issue 13
What is a hidden Markov model?
journal, October 2004
- Eddy, Sean R.
- Nature Biotechnology, Vol. 22, Issue 10
Space/time trade-offs in hash coding with allowable errors
journal, July 1970
- Bloom, Burton H.
- Communications of the ACM, Vol. 13, Issue 7, p. 422-426
How to apply de Bruijn graphs to genome assembly
journal, November 2011
- Compeau, Phillip E. C.; Pevzner, Pavel A.; Tesler, Glenn
- Nature Biotechnology, Vol. 29, Issue 11
Quantitative Detection of the nosZ Gene, Encoding Nitrous Oxide Reductase, and Comparison of the Abundances of 16S rRNA, narG, nirK, and nosZ Genes in Soils
journal, August 2006
- Henry, S.; Bru, D.; Stres, B.
- Applied and Environmental Microbiology, Vol. 72, Issue 8
Environmental Genome Shotgun Sequencing of the Sargasso Sea
journal, April 2004
- Venter, J. C.
- Science, Vol. 304, Issue 5667
An algorithm for approximate membership checking with application to password security
journal, May 1994
- Manber, Udi; Wu, Sun
- Information Processing Letters, Vol. 50, Issue 4
Ecological Patterns of nifH Genes in Four Terrestrial Climatic Zones Explored with Targeted Metagenomics Using FrameBot, a New Informatics Tool
journal, September 2013
- Wang, Qiong; Quensen, John F.; Fish, Jordan A.
- mBio, Vol. 4, Issue 5
A Sensitive and Accurate protein domain cLassification Tool (SALT) for short reads
journal, June 2013
- Zhang, Yuan; Sun, Yanni; Cole, James R.
- Bioinformatics, Vol. 29, Issue 17
Scaling metagenome sequence assembly with probabilistic de Bruijn graphs
journal, July 2012
- Pell, J.; Hintze, A.; Canino-Koning, R.
- Proceedings of the National Academy of Sciences, Vol. 109, Issue 33
Sequence Homology Search Based on Database Indexing Using the Profile Hidden Markov Model
conference, October 2006
- Xue, Qiang; Cole, James; Pramanik, Sakti
- Sixth IEEE Symposium on BioInformatics and BioEngineering (BIBE'06)
A phylogeny-driven genomic encyclopaedia of Bacteria and Archaea
journal, December 2009
- Wu, Dongying; Hugenholtz, Philip; Mavromatis, Konstantinos
- Nature, Vol. 462, Issue 7276
A Formal Basis for the Heuristic Determination of Minimum Cost Paths
journal, January 1968
- Hart, Peter; Nilsson, Nils; Raphael, Bertram
- IEEE Transactions on Systems Science and Cybernetics, Vol. 4, Issue 2
FunGene: the functional gene pipeline and repository
journal, January 2013
- Fish, Jordan A.; Chai, Benli; Wang, Qiong
- Frontiers in Microbiology, Vol. 4
FragGeneScan: predicting genes in short and error-prone reads
journal, August 2010
- Rho, Mina; Tang, Haixu; Ye, Yuzhen
- Nucleic Acids Research, Vol. 38, Issue 20
Changes in N-Transforming Archaea and Bacteria in Soil during the Establishment of Bioenergy Crops
journal, September 2011
- Mao, Yuejian; Yannarell, Anthony C.; Mackie, Roderick I.
- PLoS ONE, Vol. 6, Issue 9
Works referencing / citing this record:
A multi-source domain annotation pipeline for quantitative metagenomic and metatranscriptomic functional profiling
journal, August 2018
- Ugarte, Ari; Vicedomini, Riccardo; Bernardes, Juliana
- Microbiome, Vol. 6, Issue 1
MEGAN Community Edition - Interactive Exploration and Analysis of Large-Scale Microbiome Sequencing Data
journal, June 2016
- Huson, Daniel H.; Beier, Sina; Flade, Isabell
- PLOS Computational Biology, Vol. 12, Issue 6
MegaGTA: a sensitive and accurate metagenomic gene-targeted assembler using iterative de Bruijn graphs
journal, October 2017
- Li, Dinghua; Huang, Yukun; Leung, Chi-Ming
- BMC Bioinformatics, Vol. 18, Issue S12
Challenges and Approaches in Microbiome Research: From Fundamental to Applied
journal, August 2018
- Sergaki, Chrysi; Lagunas, Beatriz; Lidbury, Ian
- Frontiers in Plant Science, Vol. 9
Colonic Butyrate-Producing Communities in Humans: an Overview Using Omics Data
journal, December 2017
- Vital, Marius; Karch, André; Pieper, Dietmar H.
- mSystems, Vol. 2, Issue 6
Virulence factor activity relationships (VFARs): a bioinformatics perspective
journal, January 2017
- Waseem, Hassan; Williams, Maggie R.; Stedtfeld, Tiffany
- Environmental Science: Processes & Impacts, Vol. 19, Issue 3
Metagenomic Insights into the Degradation of Resistant Starch by Human Gut Microbiota
journal, September 2018
- Vital, Marius; Howe, Adina; Bergeron, Nathalie
- Applied and Environmental Microbiology, Vol. 84, Issue 23
Metagenome and Metatranscriptome Analyses Using Protein Family Profiles
journal, July 2016
- Zhong, Cuncong; Edlund, Anna; Yang, Youngik
- PLOS Computational Biology, Vol. 12, Issue 7
Community structure explains antibiotic resistance gene dynamics over a temperature gradient in soil
journal, February 2018
- Dunivin, T. K.; Shade, A.
- FEMS Microbiology Ecology, Vol. 94, Issue 3
Distribution and Diversity of Rhodopsin-Producing Microbes in the Chesapeake Bay
journal, April 2018
- Maresca, Julia A.; Miller, Kelsey J.; Keffer, Jessica L.
- Applied and Environmental Microbiology, Vol. 84, Issue 13
A multi-source domain annotation pipeline for quantitative metagenomic and metatranscriptomic functional profiling
journal, August 2018
- Ugarte, Ari; Vicedomini, Riccardo; Bernardes, Juliana
- Microbiome, Vol. 6, Issue 1
New approaches for metagenome assembly with short reads
journal, February 2019
- Ayling, Martin; Clark, Matthew D.; Leggett, Richard M.
- Briefings in Bioinformatics, Vol. 21, Issue 2
Contrasting Pathways for Anaerobic Methane Oxidation in Gulf of Mexico Cold Seep Sediments
journal, February 2019
- Vigneron, Adrien; Alsop, Eric B.; Cruaud, Perrine
- mSystems, Vol. 4, Issue 1
A Review of Bioinformatics Tools for Bio-Prospecting from Metagenomic Sequence Data
journal, March 2017
- Roumpeka, Despoina D.; Wallace, R. John; Escalettes, Frank
- Frontiers in Genetics, Vol. 8
Uncovering the trimethylamine-producing bacteria of the human gut microbiota
journal, May 2017
- Rath, Silke; Heidrich, Benjamin; Pieper, Dietmar H.
- Microbiome, Vol. 5, Issue 1
Fast and simple protein-alignment-guided assembly of orthologous gene families from microbiome sequencing reads
journal, January 2017
- Huson, Daniel H.; Tappu, Rewati; Bazinet, Adam L.
- Microbiome, Vol. 5, Issue 1
When the levee breaks: a practical guide to sketching algorithms for processing the flood of genomic data
journal, September 2019
- Rowe, Will P. M.
- Genome Biology, Vol. 20, Issue 1
Post-translational modifications are enriched within protein functional groups important to bacterial adaptation within a deep-sea hydrothermal vent environment
journal, September 2016
- Zhang, Weipeng; Sun, Jin; Cao, Huiluo
- Microbiome, Vol. 4, Issue 1
A global survey of arsenic-related genes in soil microbiomes
journal, May 2019
- Dunivin, Taylor K.; Yeh, Susanna Y.; Shade, Ashley
- BMC Biology, Vol. 17, Issue 1
Computational profiling of the gut–brain axis: microflora dysbiosis insights to neurological disorders
journal, November 2017
- Dovrolis, Nikolas; Kolios, George; Spyrou, George M.
- Briefings in Bioinformatics, Vol. 20, Issue 3
When the levee breaks: a practical guide to sketching algorithms for processing the flood of genomic data
journal, September 2019
- Rowe, Will P. M.
- Genome Biology, Vol. 20, Issue 1
A global survey of arsenic-related genes in soil microbiomes
journal, May 2019
- Dunivin, Taylor K.; Yeh, Susanna Y.; Shade, Ashley
- BMC Biology, Vol. 17, Issue 1
Cellulosic biofuel contributions to a sustainable energy future: Choices and outcomes
journal, June 2017
- Robertson, G. Philip; Hamilton, Stephen K.; Barham, Bradford L.
- Science, Vol. 356, Issue 6345
Machine learning meets genome assembly
journal, August 2018
- Padovani de Souza, Kleber; Setubal, João Carlos; Ponce de Leon F. de Carvalho, André Carlos
- Briefings in Bioinformatics, Vol. 20, Issue 6
Microbial Community Responses to Increased Water and Organic Matter in the Arid Soils of the McMurdo Dry Valleys, Antarctica
journal, July 2016
- Buelow, Heather N.; Winter, Ara S.; Van Horn, David J.
- Frontiers in Microbiology, Vol. 7
Distinct temporal diversity profiles for nitrogen cycling genes in a hyporheic microbiome
journal, January 2020
- Nelson, William C.; Graham, Emily B.; Crump, Alex R.
- PLOS ONE, Vol. 15, Issue 1