DOE Data Explorer title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Kraken2 Metagenomic Virus Database

Abstract

The Database: Kraken2 [1] database built from a classification tree containing over 700k metagenomic viruses from JGI IMG/VR [2]. (1) Wood, D. E., Lu, J., & Langmead, B. (2019). Improved metagenomic analysis with Kraken 2. Genome Biol., 20(1), 1–13. doi: 10.1186/s13059-019-1891-0 (2) Paez-Espino D, Chen I-MA, Palaniappan K, Ratner A, Chu K, Szeto E, et al. IMG/VR: a database of cultured and uncultured DNA Viruses and retroviruses. Nucleic Acids Res. 2017;45:D457–65. For Paper: Title: A k-mer based approach for virus classification in metatranscriptomic and metagenomic samples identifies viral associations in the Populus phytobiome and autism brains Abstract Background Viruses are an underrepresented taxa in the study and identification of microbiome constituents; however, they play an important role in health, microbiome regulation, and transfer of genetic material. Only a few thousand viruses have been isolated, sequenced, and assigned a taxonomy, which further limits the ability to identify and quantify viruses in the microbiome. Additionally, the vast diversity of viruses represents a challenge for classification, not only in constructing a viral taxonomy, but also in identifying similarities between a virus' genotype and its phenotype. However, the diversity of viral sequences can be leveraged to classify their sequences in metagenomic and metatranscriptomic samples.more » Methods To identify viruses in transcriptomic and genomic samples, we developed a dynamic programming algorithm for creating a classification tree out of 715,672 metagenome viruses. To create the classification tree, we clustered proportional similarity scores generated from the k-mer profiles of each of the metagenome viruses. We then integrated the viral classification tree with the NCBI taxonomy for use with ParaKraken, a metagenomic/transcriptomic classifier. Results To illustrate the breadth of our utility for classifying viruses with ParaKraken, we analyzed data from a plant metagenome study identifying the differences between two Populus genotypes in three different compartments and on a human metatranscriptome study identifying the differences between Autism Spectrum Disorder patients and controls in post mortem brain biopsies. In the Populus study, we identified genotype and compartment specific viral signatures, while in the Autism study we identified a significant increased abundance of eight viral sequences in Autism brain biopsies. Conclusion Viruses represent an important aspect of the microbiome. The ability to classify viruses represents the first step in being able to better understand their role in the microbiome. The viral classification method presented here allows for more complete identification of viral sequences for use in identifying associations between viruses and the host and viruses and other microbiome members. Acknowledgements and Funding This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. This research was also supported by the Plant-Microbe Interfaces Scientific Focus Area in the Genomic Science Program, the Office of Biological and Environmental Research (BER) in the U.S. Department of Energy Office of Science, and by the Department of Energy, Laboratory Directed Research and Development funding (ProjectID 8321), at the Oak Ridge National Laboratory. Oak Ridge National Laboratory is managed by UT-Battelle, LLC, for the US DOE under contract DE-AC05-00OR22725. This research used resources of the Compute and Data Environment for Science (CADES).« less

Authors:
; ; ; ; ; ; ; ;
Publication Date:
DOE Contract Number:  
DE-AC05-00OR22725
Product Type:
Dataset
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)
Sponsoring Org.:
Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21); Office of Science (SC), Biological and Environmental Research (BER) (SC-23)
Subject:
59 BASIC BIOLOGICAL SCIENCES; 60 APPLIED LIFE SCIENCES
Keywords:
virus,metagenomic,autism,phytobiome
OSTI Identifier:
1615774
DOI:
https://doi.org/10.13139/OLCF/1615774

Citation Formats

Garcia, Benjamin J, Simha, Ramanuja, Garvin, Michael, Furches, Anna, Jones, Piet, Hyatt, P. Doug, Schadt, Christopher, Pelletier, Dale, and Jacobson, Daniel. Kraken2 Metagenomic Virus Database. United States: N. p., 2020. Web. doi:10.13139/OLCF/1615774.
Garcia, Benjamin J, Simha, Ramanuja, Garvin, Michael, Furches, Anna, Jones, Piet, Hyatt, P. Doug, Schadt, Christopher, Pelletier, Dale, & Jacobson, Daniel. Kraken2 Metagenomic Virus Database. United States. doi:https://doi.org/10.13139/OLCF/1615774
Garcia, Benjamin J, Simha, Ramanuja, Garvin, Michael, Furches, Anna, Jones, Piet, Hyatt, P. Doug, Schadt, Christopher, Pelletier, Dale, and Jacobson, Daniel. 2020. "Kraken2 Metagenomic Virus Database". United States. doi:https://doi.org/10.13139/OLCF/1615774. https://www.osti.gov/servlets/purl/1615774. Pub date:Thu Apr 23 00:00:00 EDT 2020
@article{osti_1615774,
title = {Kraken2 Metagenomic Virus Database},
author = {Garcia, Benjamin J and Simha, Ramanuja and Garvin, Michael and Furches, Anna and Jones, Piet and Hyatt, P. Doug and Schadt, Christopher and Pelletier, Dale and Jacobson, Daniel},
abstractNote = {The Database: Kraken2 [1] database built from a classification tree containing over 700k metagenomic viruses from JGI IMG/VR [2]. (1) Wood, D. E., Lu, J., & Langmead, B. (2019). Improved metagenomic analysis with Kraken 2. Genome Biol., 20(1), 1–13. doi: 10.1186/s13059-019-1891-0 (2) Paez-Espino D, Chen I-MA, Palaniappan K, Ratner A, Chu K, Szeto E, et al. IMG/VR: a database of cultured and uncultured DNA Viruses and retroviruses. Nucleic Acids Res. 2017;45:D457–65. For Paper: Title: A k-mer based approach for virus classification in metatranscriptomic and metagenomic samples identifies viral associations in the Populus phytobiome and autism brains Abstract Background Viruses are an underrepresented taxa in the study and identification of microbiome constituents; however, they play an important role in health, microbiome regulation, and transfer of genetic material. Only a few thousand viruses have been isolated, sequenced, and assigned a taxonomy, which further limits the ability to identify and quantify viruses in the microbiome. Additionally, the vast diversity of viruses represents a challenge for classification, not only in constructing a viral taxonomy, but also in identifying similarities between a virus' genotype and its phenotype. However, the diversity of viral sequences can be leveraged to classify their sequences in metagenomic and metatranscriptomic samples. Methods To identify viruses in transcriptomic and genomic samples, we developed a dynamic programming algorithm for creating a classification tree out of 715,672 metagenome viruses. To create the classification tree, we clustered proportional similarity scores generated from the k-mer profiles of each of the metagenome viruses. We then integrated the viral classification tree with the NCBI taxonomy for use with ParaKraken, a metagenomic/transcriptomic classifier. Results To illustrate the breadth of our utility for classifying viruses with ParaKraken, we analyzed data from a plant metagenome study identifying the differences between two Populus genotypes in three different compartments and on a human metatranscriptome study identifying the differences between Autism Spectrum Disorder patients and controls in post mortem brain biopsies. In the Populus study, we identified genotype and compartment specific viral signatures, while in the Autism study we identified a significant increased abundance of eight viral sequences in Autism brain biopsies. Conclusion Viruses represent an important aspect of the microbiome. The ability to classify viruses represents the first step in being able to better understand their role in the microbiome. The viral classification method presented here allows for more complete identification of viral sequences for use in identifying associations between viruses and the host and viruses and other microbiome members. Acknowledgements and Funding This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. This research was also supported by the Plant-Microbe Interfaces Scientific Focus Area in the Genomic Science Program, the Office of Biological and Environmental Research (BER) in the U.S. Department of Energy Office of Science, and by the Department of Energy, Laboratory Directed Research and Development funding (ProjectID 8321), at the Oak Ridge National Laboratory. Oak Ridge National Laboratory is managed by UT-Battelle, LLC, for the US DOE under contract DE-AC05-00OR22725. This research used resources of the Compute and Data Environment for Science (CADES).},
doi = {10.13139/OLCF/1615774},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2020},
month = {4}
}