skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: MyTaxa: an advanced taxonomic classifier for genomic and metagenomic sequences

Journal Article · · Nucleic Acids Research
DOI:https://doi.org/10.1093/nar/gku169· OSTI ID:1904533

Determining the taxonomic affiliation of sequences assembled from metagenomes remains a major bottleneck that affects research across the fields of environmental, clinical and evolutionary microbiology. Here, we introduce MyTaxa, a homology-based bioinformatics framework to classify metagenomic and genomic sequences with unprecedented accuracy. The distinguishing aspect of MyTaxa is that it employs all genes present in an unknown sequence as classifiers, weighting each gene based on its (predetermined) classifying power at a given taxonomic level and frequency of horizontal gene transfer. MyTaxa also implements a novel classification scheme based on the genome-aggregate average amino acid identity concept to determine the degree of novelty of sequences representing uncharacterized taxa, i.e. whether they represent novel species, genera or phyla. Application of MyTaxa on in silico generated (mock) and real metagenomes of varied read length (100–2000 bp) revealed that it correctly classified at least 5% more sequences than any other tool. The analysis also showed that ~10% of the assembled sequences from human gut metagenomes represent novel species with no sequenced representatives, several of which were highly abundant in situ such as members of the Prevotella genus. Thus, MyTaxa can find several important applications in microbial identification and diversity studies.

Research Organization:
Univ. of Oklahoma, Norman, OK (United States); Georgia Institute of Technology, Atlanta, GA (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Biological and Environmental Research (BER); National Science Foundation (NSF)
Grant/Contract Number:
SC0004601; 1241046
OSTI ID:
1904533
Journal Information:
Nucleic Acids Research, Vol. 42, Issue 8; ISSN 0305-1048
Publisher:
Oxford University PressCopyright Statement
Country of Publication:
United States
Language:
English

References (37)

Prokaryotic taxonomy and phylogeny in the genomic era: advancements and challenges ahead journal October 2007
Enterotypes of the human gut microbiome journal April 2011
Using the Metagenomics RAST Server (MG-RAST) for Analyzing Shotgun Metagenomes journal January 2010
The Ribosomal Database Project: improved alignments and new tools for rRNA analysis journal January 2009
Phylogenetic classification of short environmental DNA fragments journal February 2008
FragGeneScan: predicting genes in short and error-prone reads journal August 2010
BLAT---The BLAST-Like Alignment Tool journal March 2002
Greengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARB journal July 2006
Direct Comparisons of Illumina vs. Roche 454 Sequencing Technologies on the Same Microbial Community DNA Sample journal February 2012
SOrt-ITEMS: Sequence orthology based approach for improved taxonomic estimation of metagenomic sequences journal May 2009
Gene and translation initiation site prediction in metagenomic sequences journal July 2012
Towards a Genome-Based Taxonomy for Prokaryotes journal September 2005
Fermentation, Hydrogen, and Sulfur Metabolism in Multiple Uncultivated Bacterial Phyla journal September 2012
Single cell genomics: an individual look at microbes journal October 2012
Metagenomic species profiling using universal phylogenetic marker genes journal October 2013
Untangling Genomes from Metagenomes: Revealing an Uncultured Class of Marine Euryarchaeota journal February 2012
Pfam: multiple sequence alignments and HMM-profiles of protein domains journal January 1998
Classification of metagenomic sequences: methods and challenges journal September 2012
Community cyberinfrastructure for Advanced Microbial Ecology Research and Analysis: the CAMERA resource journal November 2010
Phylogenetic identification and in situ detection of individual microbial cells without cultivation. journal January 1995
Assembling the Marine Metagenome, One Cell at a Time journal April 2009
The metagenome of a biogas-producing microbial community of a production-scale biogas plant fermenter analysed by the 454-pyrosequencing technology journal August 2008
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs journal September 1997
Genomic patterns of recombination, clonal divergence and environment in marine microbial populations journal June 2008
FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments journal March 2010
Integrative analysis of environmental sequences using MEGAN4 journal June 2011
MUSCLE: multiple sequence alignment with high accuracy and high throughput journal March 2004
Genomic insights that advance the species definition for prokaryotes journal February 2005
Taxonomic metagenome sequence assignment with structured output models journal February 2011
Ab initio gene identification in metagenomic sequences journal April 2010
NBC: the Naive Bayes Classification tool webserver for taxonomic classification of metagenomic reads journal November 2010
Interactive metagenomic visualization in a Web browser journal September 2011
Search and clustering orders of magnitude faster than BLAST journal August 2010
DNA–DNA hybridization values and their relationship to whole-genome sequence similarities journal January 2007
Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models journal August 2009
Updating Prokaryotic Taxonomy journal September 2005
Phylogenetic identification and in situ detection of individual microbial cells without cultivation. journal January 1995

Cited By (21)

Web Resources for Metagenomics Studies journal October 2015
SAR11 bacteria linked to ocean anoxia and nitrogen loss journal August 2016
Methanotrophy across a natural permafrost thaw environment journal June 2018
Primer-free FISH probes from metagenomics/metatranscriptomics data permit the study of uncharacterised taxa in complex microbial communities journal June 2019
AAI-profiler: fast proteome-wide exploratory analysis reveals taxonomic identity, misclassification and contamination journal May 2018
The Microbial Genomes Atlas (MiGA) webserver: taxonomic and gene diversity analysis of Archaea and Bacteria at the whole genome level journal June 2018
Characterization of a thaumarchaeal symbiont that drives incomplete nitrification in the tropical sponge Ianthella basta journal July 2019
Metagenomics Reveals the Impact of Wastewater Treatment Plants on the Dispersal of Microorganisms and Genes in Aquatic Sediments journal December 2017
Draft Genome Sequences of Seven Bacterial Strains Isolated from a Polymicrobial Culture of Coccolith-Bearing (C-Type) Emiliania huxleyi M217 journal August 2016
Taxon-Driven Functional Shifts Associated with Storm Flow in an Urban Stream Microbial Community journal August 2018
Complementary Metagenomic Approaches Improve Reconstruction of Microbial Diversity in a Forest Soil journal April 2020
Integrative workflows for metagenomic analysis journal November 2014
Metagenomics Reveals Pervasive Bacterial Populations and Reduced Community Diversity across the Alaska Tundra Ecosystem journal April 2016
Genomic Description of ‘Candidatus Abyssubacteria,’ a Novel Subsurface Lineage Within the Candidate Phylum Hydrogenedentes journal August 2018
Disorganized Gut Microbiome Contributed to Liver Cirrhosis Progression: A Meta-Omics-Based Study journal December 2018
Geomonas oryzae gen. nov., sp. nov., Geomonas edaphica sp. nov., Geomonas ferrireducens sp. nov., Geomonas terrae sp. nov., Four Ferric-Reducing Bacteria Isolated From Paddy Soil, and Reclassification of Three Species of the Genus Geobacter as Members of the Genus Geomonas gen. nov. journal September 2019
Genome-Based Metabolic Reconstruction of a Novel Uncultivated Freshwater Magnetotactic coccus “Ca. Magnetaquicoccus inordinatus” UR-1, and Proposal of a Candidate Family “Ca. Magnetaquicoccaceae” journal October 2019
Genomic and Biotechnological Characterization of the Heavy-Metal Resistant, Arsenic-Oxidizing Bacterium Ensifer sp. M14 journal July 2018
Metagenomics: Tools and Insights for Analyzing Next-Generation Sequencing Data Derived from Biodiversity Studies journal January 2015
Identifying viruses from metagenomic data by deep learning preprint January 2018
imGLAD: accurate detection and quantification of target organisms in metagenomes journal November 2018