Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Scalable metagenomic taxonomy classification using a reference genome database

Journal Article · · Bioinformatics
 [1];  [2];  [3];  [2];  [2];  [3]
  1. Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States). Center for Applied Scientific Computing; DOE/OSTI
  2. Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States). Center for Applied Scientific Computing
  3. Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States). Center for Applied Scientific Computing. Global Security Directorate

Motivation: Deep metagenomic sequencing of biological samples has the potential to recover otherwise difficult-to-detect microorganisms and accurately characterize biological samples with limited prior knowledge of sample contents. Existing metagenomic taxonomic classification algorithms, however, do not scale well to analyze large metagenomic datasets, and balancing classification accuracy with computational efficiency presents a fundamental challenge. Results: A method is presented to shift computational costs to an offline computation by creating a taxonomy/genome index that supports scalable metagenomic classification. Scalable performance is demonstrated on real and simulated data to show accurate classification in the presence of novel organisms on samples that include viruses, prokaryotes, fungi and protists. Taxonomic classification of the previously published 150 giga-base Tyrolean Iceman dataset was found to take520 h on a single node 40 core large memory machine and provide new insights on the metagenomic contents of the sample.

Research Organization:
Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA); USDOE Laboratory Directed Research and Development (LDRD) Program
Grant/Contract Number:
AC52-07NA27344
OSTI ID:
1625283
Journal Information:
Bioinformatics, Journal Name: Bioinformatics Journal Issue: 18 Vol. 29; ISSN 1367-4803
Publisher:
International Society for Computational Biology - Oxford University PressCopyright Statement
Country of Publication:
United States
Language:
English

References (31)

Compressive genomics journal July 2012
Cloud computing and the DNA data race journal July 2010
New insights into the Tyrolean Iceman's origin and phenotype as inferred by whole-genome sequencing journal January 2012
Metagenomic microbial community profiling using unique clade-specific marker genes journal June 2012
PhymmBL expanded: confidence scores, custom databases, parallelization and more journal April 2011
Current opportunities and challenges in microbial metagenome analysis--a bioinformatic perspective journal September 2012
Classification of metagenomic sequences: methods and challenges journal September 2012
SPHINX—an algorithm for taxonomic binning of metagenomic sequences journal October 2010
A robust and accurate binning algorithm for metagenomic sequences with arbitrary species abundance ratio journal April 2011
Rapid phylogenetic and functional classification of short genomic fragments with signature peptides journal January 2012
The need for speed journal January 2009
Accurate and fast estimation of taxonomic profiles from metagenomic shotgun sequences journal September 2011
MetaSim—A Sequencing Simulator for Genomics and Metagenomics journal October 2008
Resources and Costs for Microbial Sequence Analysis Evaluated Using Virtual Machines and Cloud Computing journal October 2011
Plantagora: Modeling Whole Genome Sequencing and Assembly of Plant Genomes journal December 2011
Fast and Accurate Taxonomic Assignments of Metagenomic Sequences Using MetaBin journal April 2012
Genometa - A Fast and Accurate Classifier for Short Metagenomic Shotgun Reads journal August 2012
Scaling metagenome sequence assembly with probabilistic de Bruijn graphs text January 2011
Compressive genomics book December 2015
Unlocking the potential of metagenomics through replicated experimental design journal June 2012
SPOCD1 is an essential executor of piRNA-directed de novo DNA methylation journal July 2020
Scaling metagenome sequence assembly with probabilistic de Bruijn graphs journal July 2012
Taxonomic binning of metagenome samples generated by next-generation sequencing technologies journal July 2012
MEGAN analysis of metagenomic data journal February 2007
Environmental Genome Shotgun Sequencing of the Sargasso Sea journal April 2004
Accurate and fast estimation of taxonomic profiles from metagenomic shotgun sequences journal September 2011
Accurate and fast estimation of taxonomic profiles from metagenomic shotgun sequences journal January 2011
Ultrafast and memory-efficient alignment of short DNA sequences to the human genome journal January 2009
Assessment of Metagenomic Assembly Using Simulated Next Generation Sequencing Data journal February 2012
Optimizing Read Mapping to Reference Genomes to Determine Composition and Species Prevalence in Microbial Communities journal June 2012
Genometa - A Fast and Accurate Classifier for Short Metagenomic Shotgun Reads text January 2016

Cited By (85)

DI-MMAP—a scalable memory-map runtime for out-of-core data-intensive applications journal October 2013
Under-detection of endospore-forming Firmicutes in metagenomic data journal January 2015
FDA-ARGOS is a database with public quality-controlled reference genomes for diagnostic use and regulatory science journal July 2019
Thanatomicrobiome composition profiling as a tool for forensic investigation journal April 2018
Metagenomic binning through low-density hashing journal July 2018
A new way to contemplate Darwin's tangled bank: how DNA barcodes are reconnecting biodiversity science and biomonitoring
  • Hajibabaei, Mehrdad; Baird, Donald J.; Fahner, Nicole A.
  • Philosophical Transactions of the Royal Society B: Biological Sciences, Vol. 371, Issue 1702 https://doi.org/10.1098/rstb.2015.0330
journal September 2016
An evaluation of the accuracy and speed of metagenome analysis tools journal October 2015
Kaiju: Fast and sensitive taxonomic classification for metagenomics journal December 2015
Searching more genomic sequence with less memory for fast and accurate metagenomic profiling posted_content January 2016
TaxMapper: An Analysis Tool, Reference Database and Workow for Metatranscriptome Analysis of Eukaryotic Microorganisms journal August 2017
BLAST-based validation of metagenomic sequence assignments posted_content April 2018
Identifying accurate metagenome and amplicon software via a meta-analysis of sequence to taxonomy benchmarking studies journal October 2018
Rapid alignment-free phylogenetic identification of metagenomic sequences journal June 2018
Keeping up with the genomes: efficient learning of our increasing knowledge of the tree of life posted_content February 2020
On the impact of contaminants on the accuracy of genome skimming and the effectiveness of exclusion read filters posted_content November 2019
The impact of contaminants on the accuracy of genome skimming and the effectiveness of exclusion read filters journal February 2020
Health and Disease Imprinted in the Time Variability of the Human Microbiome journal March 2017
Challenges of the Unknown: Clinical Application of Microbial Metagenomics journal January 2015
Clinical PathoScope: rapid alignment and filtration for accurate pathogen identification in clinical samples using unassembled sequencing data journal January 2014
FASTQSim: platform-independent data characterization and in silico read generation for NGS datasets journal January 2014
Validation of high throughput sequencing and microbial forensics applications journal January 2014
PathoScope 2.0: a complete computational framework for strain identification in environmental or clinical sequencing samples journal September 2014
Evaluation of shotgun metagenomics sequence classification methods using in silico and in vitro simulated communities journal November 2015
Assessment of k-mer spectrum applicability for metagenomic dissimilarity analysis journal January 2016
Scalable metagenomics alignment research tool (SMART): a scalable, rapid, and complete search heuristic for the classification of metagenomic sequences from complex sequence populations journal July 2016
CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers journal March 2015
TaxMapper: an analysis tool, reference database and workflow for metatranscriptome analysis of eukaryotic microorganisms journal October 2017
Comprehensive benchmarking and ensemble approaches for metagenomic classifiers journal September 2017
Alignment-free sequence comparison: benefits, applications, and tools journal October 2017
KrakenUniq: confident and fast metagenomics classification using unique k-mer counts journal November 2018
Correction to: Comprehensive benchmarking and ensemble approaches for metagenomic classifiers journal April 2019
Targeted amplification for enhanced detection of biothreat agents by next-generation sequencing journal November 2015
Whole metagenome profiles of particulates collected from the International Space Station journal July 2017
ReprDB and panDB: minimalist databases with maximal microbial representation journal January 2018
Identification and Genomic Analysis of a Novel Group C Orthobunyavirus Isolated from a Mosquito Captured near Iquitos, Peru journal April 2016
Metagenomic Analysis Reveals Presence of Treponema denticola in a Tissue Biopsy of the Iceman journal June 2014
CoMeta: Classification of Metagenomes Using k-mers journal April 2015
Indexing Arbitrary-Length k-Mers in Sequencing Reads journal July 2015
WEVOTE: Weighted Voting Taxonomic Identification Method of Microbial Sequences journal September 2016
A novel variant of torque teno virus 7 identified in patients with Kawasaki disease journal December 2018
Finding and identifying the viral needle in the metagenomic haystack: trends and challenges journal January 2015
A Metagenomic Approach to Cyanobacterial Genomics journal May 2017
Overview of Virus Metagenomic Classification Methods and Their Biological Applications journal April 2018
Host-Microbiome Interaction and Cancer: Potential Application in Precision Medicine journal December 2016
Plant virus metagenomics: what we know and why we need to know more journal April 2014
Metagenomic search strategies for interactions among plants and multiple microbes journal June 2014
Graph Theory-Based Sequence Descriptors as Remote Homology Predictors journal December 2019
CoreProbe: A Novel Algorithm for Estimating Relative Abundance Based on Metagenomic Reads journal June 2018
Considerations for Optimization of High-Throughput Sequencing Bioinformatics Pipelines for Virus Detection journal September 2018
Bioinformatic Characterization of Mosquito Viromes within the Eastern United States and Puerto Rico: Discovery of Novel Viruses journal January 2016
Spaced seeds improve k-mer-based metagenomic classification text January 2015
Machine learning for metagenomics: methods and tools preprint January 2015
MetaPalette: A $k$-mer painting approach for metagenomic taxonomic profiling and quantification of novel strain variation preprint January 2016
A molecular portrait of maternal sepsis from Byzantine Troy journal January 2017
Genetic approach towards a vaccine against malaria
  • Garrido-Cardenas, Jose Antonio; Mesa-Valle, Concepción; Manzano-Agugliaro, Francisco
  • European Journal of Clinical Microbiology & Infectious Diseases, Vol. 37, Issue 10 https://doi.org/10.1007/s10096-018-3313-8
journal June 2018
Fast and sensitive taxonomic classification for metagenomics with Kaiju journal April 2016
A clinician's guide to microbiome analysis journal August 2017
An evaluation of the accuracy and speed of metagenome analysis tools journal January 2016
A review of methods and databases for metagenomic classification and assembly journal September 2017
Spaced seeds improve k -mer-based metagenomic classification journal July 2015
A novel data structure to support ultra-fast taxonomic classification of metagenomic sequences with k-mer signatures journal July 2017
MetaCache: context-aware classification of metagenomic reads using minhashing journal August 2017
Rapid alignment-free phylogenetic identification of metagenomic sequences journal January 2019
A Novel Prosthetic Joint Infection Pathogen, Mycoplasma salivarium, Identified by Metagenomic Shotgun Sequencing journal April 2017
Opportunities and obstacles for deep learning in biology and medicine journal April 2018
Metapalette: A k-Mer Painting Approach for Metagenomic Taxonomic Profiling and Quantification of Novel Strain Variation posted_content February 2016
Metagenomic binning through low density hashing journal May 2017
Opportunities and obstacles for deep learning in biology and medicine posted_content January 2018
Comprehensive Benchmarking and Ensemble Approaches for Metagenomic Classifiers journal June 2017
ReprDB and panDB: minimalist databases with maximal microbial representation posted_content November 2017
KrakenHLL: Confident and fast metagenomics classification using unique k-mer counts journal January 2018
GAIA: an integrated metagenomics suite posted_content January 2019
Higher recall in metagenomic sequence classification exploiting Overlapping Reads conference October 2016
Capturing the Resistome: a Targeted Capture Method To Reveal Antibiotic Resistance Determinants in Metagenomes journal October 2019
Direct Detection and Identification of Prosthetic Joint Infection Pathogens in Synovial Fluid by Metagenomic Shotgun Sequencing journal September 2018
MetaPalette: a k -mer Painting Approach for Metagenomic Taxonomic Profiling and Quantification of Novel Strain Variation journal June 2016
Kraken: ultrafast metagenomic sequence classification using exact alignments journal January 2014
Keeping up with the genomes: efficient learning of our increasing knowledge of the tree of life journal September 2020
Higher recall in metagenomic sequence classification exploiting overlapping reads journal December 2017
Simplitigs as an efficient and scalable representation of de Bruijn graphs journal April 2021
Bacterial and viral identification and differentiation by amplicon sequencing on the MinION nanopore sequencer journal March 2015
Recentrifuge: Robust comparative analysis and contamination removal for metagenomics journal April 2019
Comparing fixed sampling with minimizer sampling when using k-mer indexes to find maximal exact matches journal February 2018
Application of metagenomic shotgun sequencing to detect vector-borne pathogens in clinical blood samples journal October 2019
BLAST-based validation of metagenomic sequence assignments journal January 2018