DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: A k-mer based approach for classifying viruses without taxonomy identifies viral associations in human autism and plant microbiomes

Abstract

Viruses are an underrepresented taxa in the study and identification of microbiome constituents; however, they play an essential role in health, microbiome regulation, and transfer of genetic material. Only a few thousand viruses have been isolated, sequenced, and assigned a taxonomy, which limits the ability to identify and quantify viruses in the microbiome. Additionally, the vast diversity of viruses represents a challenge for classification, not only in constructing a viral taxonomy, but also in identifying similarities between a virus’ genotype and its phenotype. However, the diversity of viral sequences can be leveraged to classify their sequences in metagenomic and metatranscriptomic samples, even if they do not have a taxonomy. To identify and quantify viruses in transcriptomic and genomic samples, we developed a dynamic programming algorithm for creating a classification tree out of 715,672 metagenome viruses. To create the classification tree, we clustered proportional similarity scores generated from the k-mer profiles of each of the metagenome viruses to create a database of metagenomic viruses. The resulting Kraken2 database of the metagenomic viruses can be found here: https://www.osti.gov/biblio/1615774 and is compatible with Kraken2. We then integrated the viral classification database with databases created with genomes from NCBI for use with ParaKraken (amore » parallelized version of Kraken provided in Supplemental Zip 1), a metagenomic/transcriptomic classifier. To illustrate the breadth of our utility for classifying metagenome viruses, we analyzed data from a plant metagenome study identifying genotypic and compartment specific differences between two Populus genotypes in three different compartments. We also identified a significant increase in abundance of eight viral sequences in post mortem brains in a human metatranscriptome study comparing Autism Spectrum Disorder patients and controls. We also show the potential accuracy for classifying viruses by utilizing both the JGI and NCBI viral databases to identify the uniqueness of viral sequences. Finally, we validate the accuracy of viral classification with NCBI databases containing viruses with taxonomy to identify pathogenic viruses in known COVID-19 and cassava brown streak virus infection samples. Our method represents the compulsory first step in better understanding the role of viruses in the microbiome by allowing for a more complete identification of sequences without taxonomy. Better classification of viruses will improve identifying associations between viruses and their hosts as well as viruses and other microbiome members. Despite the lack of taxonomy, this database of metagenomic viruses can be used with any tool that utilizes a taxonomy, such as Kraken, for accurate classification of viruses.« less

Authors:
ORCiD logo [1];  [2]; ORCiD logo [2]; ORCiD logo [3];  [3]; ORCiD logo [2];  [2]; ORCiD logo [2]; ORCiD logo [2]; ORCiD logo [3]
  1. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Massachusetts Inst. of Technology (MIT), Cambridge, MA (United States)
  2. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
  3. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Univ. of Tennessee, Knoxville, TN (United States)
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Biological and Environmental Research (BER)
OSTI Identifier:
1831698
Grant/Contract Number:  
AC05-00OR22725
Resource Type:
Accepted Manuscript
Journal Name:
Computational and Structural Biotechnology Journal
Additional Journal Information:
Journal Volume: 19; Journal Issue: na; Journal ID: ISSN 2001-0370
Publisher:
Elsevier
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; metagenomics; viriome; metatranscriptomics; microbimes; autism spectrum disorder; populus

Citation Formats

Garcia, Benjamin J., Simha, Ramanuja, Garvin, Michael, Furches, Anna K., Jones, Piet C., Felipe Machado Gazolla, Joao Gabriel, Hyatt, Philip, Schadt, Christopher Warren, Pelletier, Dale, and Jacobson, Daniel A. A k-mer based approach for classifying viruses without taxonomy identifies viral associations in human autism and plant microbiomes. United States: N. p., 2021. Web. doi:10.1016/j.csbj.2021.10.029.
Garcia, Benjamin J., Simha, Ramanuja, Garvin, Michael, Furches, Anna K., Jones, Piet C., Felipe Machado Gazolla, Joao Gabriel, Hyatt, Philip, Schadt, Christopher Warren, Pelletier, Dale, & Jacobson, Daniel A. A k-mer based approach for classifying viruses without taxonomy identifies viral associations in human autism and plant microbiomes. United States. https://doi.org/10.1016/j.csbj.2021.10.029
Garcia, Benjamin J., Simha, Ramanuja, Garvin, Michael, Furches, Anna K., Jones, Piet C., Felipe Machado Gazolla, Joao Gabriel, Hyatt, Philip, Schadt, Christopher Warren, Pelletier, Dale, and Jacobson, Daniel A. Mon . "A k-mer based approach for classifying viruses without taxonomy identifies viral associations in human autism and plant microbiomes". United States. https://doi.org/10.1016/j.csbj.2021.10.029. https://www.osti.gov/servlets/purl/1831698.
@article{osti_1831698,
title = {A k-mer based approach for classifying viruses without taxonomy identifies viral associations in human autism and plant microbiomes},
author = {Garcia, Benjamin J. and Simha, Ramanuja and Garvin, Michael and Furches, Anna K. and Jones, Piet C. and Felipe Machado Gazolla, Joao Gabriel and Hyatt, Philip and Schadt, Christopher Warren and Pelletier, Dale and Jacobson, Daniel A.},
abstractNote = {Viruses are an underrepresented taxa in the study and identification of microbiome constituents; however, they play an essential role in health, microbiome regulation, and transfer of genetic material. Only a few thousand viruses have been isolated, sequenced, and assigned a taxonomy, which limits the ability to identify and quantify viruses in the microbiome. Additionally, the vast diversity of viruses represents a challenge for classification, not only in constructing a viral taxonomy, but also in identifying similarities between a virus’ genotype and its phenotype. However, the diversity of viral sequences can be leveraged to classify their sequences in metagenomic and metatranscriptomic samples, even if they do not have a taxonomy. To identify and quantify viruses in transcriptomic and genomic samples, we developed a dynamic programming algorithm for creating a classification tree out of 715,672 metagenome viruses. To create the classification tree, we clustered proportional similarity scores generated from the k-mer profiles of each of the metagenome viruses to create a database of metagenomic viruses. The resulting Kraken2 database of the metagenomic viruses can be found here: https://www.osti.gov/biblio/1615774 and is compatible with Kraken2. We then integrated the viral classification database with databases created with genomes from NCBI for use with ParaKraken (a parallelized version of Kraken provided in Supplemental Zip 1), a metagenomic/transcriptomic classifier. To illustrate the breadth of our utility for classifying metagenome viruses, we analyzed data from a plant metagenome study identifying genotypic and compartment specific differences between two Populus genotypes in three different compartments. We also identified a significant increase in abundance of eight viral sequences in post mortem brains in a human metatranscriptome study comparing Autism Spectrum Disorder patients and controls. We also show the potential accuracy for classifying viruses by utilizing both the JGI and NCBI viral databases to identify the uniqueness of viral sequences. Finally, we validate the accuracy of viral classification with NCBI databases containing viruses with taxonomy to identify pathogenic viruses in known COVID-19 and cassava brown streak virus infection samples. Our method represents the compulsory first step in better understanding the role of viruses in the microbiome by allowing for a more complete identification of sequences without taxonomy. Better classification of viruses will improve identifying associations between viruses and their hosts as well as viruses and other microbiome members. Despite the lack of taxonomy, this database of metagenomic viruses can be used with any tool that utilizes a taxonomy, such as Kraken, for accurate classification of viruses.},
doi = {10.1016/j.csbj.2021.10.029},
journal = {Computational and Structural Biotechnology Journal},
number = na,
volume = 19,
place = {United States},
year = {Mon Oct 25 00:00:00 EDT 2021},
month = {Mon Oct 25 00:00:00 EDT 2021}
}

Works referenced in this record:

Does Rubella Cause Autism: A 2015 Reappraisal?
journal, February 2016


Redondoviridae, a Family of Small, Circular DNA Viruses of the Human Oro-Respiratory Tract Associated with Periodontitis and Critical Illness
journal, August 2019


Dynamic Modulation of the Gut Microbiota and Metabolome by Bacteriophages in a Mouse Model
journal, June 2019


Improved metagenomic analysis with Kraken 2
journal, November 2019


Evaluation of a concatenated protein phylogeny for classification of tailed double-stranded DNA viruses belonging to the order Caudovirales
journal, May 2019

  • Low, Soo Jen; Džunková, Mária; Chaumeil, Pierre-Alain
  • Nature Microbiology, Vol. 4, Issue 8
  • DOI: 10.1038/s41564-019-0448-z

STAR: ultrafast universal RNA-seq aligner
journal, October 2012


Viruses as Winners in the Game of Life
journal, September 2016


Prenatal and Perinatal Risk Factors for Autism in China
journal, April 2010

  • Zhang, Xin; Lv, Cong-Chao; Tian, Jiang
  • Journal of Autism and Developmental Disorders, Vol. 40, Issue 11
  • DOI: 10.1007/s10803-010-0992-0

Virus taxonomy in the age of metagenomics
journal, January 2017

  • Simmonds, Peter; Adams, Mike J.; Benkő, Mária
  • Nature Reviews Microbiology, Vol. 15, Issue 3
  • DOI: 10.1038/nrmicro.2016.177

Shared and unique responses of plants to multiple individual stresses and stress combinations: physiological and molecular mechanisms
journal, September 2015

  • Pandey, Prachi; Ramegowda, Venkategowda; Senthil-Kumar, Muthappa
  • Frontiers in Plant Science, Vol. 6
  • DOI: 10.3389/fpls.2015.00723

The interactive effects of simultaneous biotic and abiotic stresses on plants: Mechanistic understanding from drought and pathogen combination
journal, March 2015


Peripheral Nervous System Manifestations of Infectious Diseases
journal, June 2014


Association of Human Immunodeficiency Virus Infection and Risk of Peripheral Artery Disease
journal, July 2018


Nontargeted virus sequence discovery pipeline and virus clustering for metagenomic data
journal, July 2017

  • Paez-Espino, David; Pavlopoulos, Georgios A.; Ivanova, Natalia N.
  • Nature Protocols, Vol. 12, Issue 8
  • DOI: 10.1038/nprot.2017.063

Fold change rank ordering statistics: a new method for detecting differentially expressed genes
journal, January 2014


Skewer: a fast and accurate adapter trimmer for next-generation sequencing paired-end reads
journal, June 2014


Real Time Classification of Viruses in 12 Dimensions
journal, May 2013


IMG: the integrated microbial genomes database and comparative analysis system
journal, December 2011

  • Markowitz, V. M.; Chen, I. -M. A.; Palaniappan, K.
  • Nucleic Acids Research, Vol. 40, Issue D1
  • DOI: 10.1093/nar/gkr1044

Peach RNA viromes in six different peach cultivars
journal, January 2018


Association Between the Respiratory Microbiome and Susceptibility to Influenza Virus Infection
journal, September 2019

  • Tsang, Tim K.; Lee, Kyu Han; Foxman, Betsy
  • Clinical Infectious Diseases, Vol. 71, Issue 5
  • DOI: 10.1093/cid/ciz968

The Virome of Cerebrospinal Fluid: Viruses Where We Once Thought There Were None
journal, September 2019

  • Ghose, Chandrabali; Ly, Melissa; Schwanemann, Leila K.
  • Frontiers in Microbiology, Vol. 10
  • DOI: 10.3389/fmicb.2019.02061

Partitioning the Genetic Diversity of a Virus Family: Approach and Evaluation through a Case Study of Picornaviruses
journal, January 2012


Kraken: ultrafast metagenomic sequence classification using exact alignments
journal, January 2014


Single-cell genomics identifies cell type–specific molecular changes in autism
journal, May 2019


Destabilization of the gut microbiome marks the end-stage of simian immunodeficiency virus infection in wild chimpanzees: Impact of SIVcpz on the Gut Microbiome
journal, December 2015

  • Barbian, Hannah J.; Li, Yingying; Ramirez, Miguel
  • American Journal of Primatology, Vol. 80, Issue 1
  • DOI: 10.1002/ajp.22515

Autistic disorder and viral infections
journal, February 2005


IMG/VR: a database of cultured and uncultured DNA Viruses and retroviruses
journal, October 2016

  • Paez-Espino, David; Chen, I. -Min A.; Palaniappan, Krishna
  • Nucleic Acids Research, Vol. 45, Issue D1
  • DOI: 10.1093/nar/gkw1030

Uncovering Earth’s virome
journal, August 2016

  • Paez-Espino, David; Eloe-Fadrosh, Emiley A.; Pavlopoulos, Georgios A.
  • Nature, Vol. 536, Issue 7617
  • DOI: 10.1038/nature19094

HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks
journal, January 2018

  • Azad, Ariful; Pavlopoulos, Georgios A.; Ouzounis, Christos A.
  • Nucleic Acids Research, Vol. 46, Issue 6
  • DOI: 10.1093/nar/gkx1313

ViromeScan: a new tool for metagenomic viral community profiling
journal, March 2016


Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation
journal, November 2015

  • O'Leary, Nuala A.; Wright, Mathew W.; Brister, J. Rodney
  • Nucleic Acids Research, Vol. 44, Issue D1
  • DOI: 10.1093/nar/gkv1189

Autism After Infection, Febrile Episodes, and Antibiotic Use During Pregnancy: An Exploratory Study
journal, November 2012

  • Atladottir, H. O.; Henriksen, T. B.; Schendel, D. E.
  • PEDIATRICS, Vol. 130, Issue 6
  • DOI: 10.1542/peds.2012-1107

Phytobiome and Transcriptional Adaptation of Populus deltoides to Acute Progressive Drought and Cyclic Drought
journal, January 2018


A Novel Method of Characterizing Genetic Sequences: Genome Space with Biological Distance and Applications
journal, March 2011


Metavir 2: new tools for viral metagenome comparison and assembled virome analysis
journal, January 2014


Phytovirome Analysis of Wild Plant Populations: Comparison of Double-Stranded RNA and Virion-Associated Nucleic Acid Metagenomic Approaches
journal, December 2019

  • Ma, Yuxin; Marais, Armelle; Lefebvre, Marie
  • Journal of Virology, Vol. 94, Issue 1
  • DOI: 10.1128/JVI.01462-19

Cytoscape: A Software Environment for Integrated Models of Biomolecular Interaction Networks
journal, November 2003


Association Between Hepatitis C Virus and Chronic Kidney Disease: A Systematic Review and Meta-Analysis
journal, May 2018

  • Fabrizi, Fabrizio; Donato, Francesca M.; Messa, Piergiorgio
  • Annals of Hepatology, Vol. 17, Issue 3
  • DOI: 10.5604/01.3001.0011.7382

Changes to virus taxonomy and the International Code of Virus Classification and Nomenclature ratified by the International Committee on Taxonomy of Viruses (2019)
journal, June 2019

  • Walker, Peter J.; Siddell, Stuart G.; Lefkowitz, Elliot J.
  • Archives of Virology, Vol. 164, Issue 9
  • DOI: 10.1007/s00705-019-04306-w

Common infections with polyomaviruses and herpesviruses and neuropsychological development at 4 years of age, the Rhea birth cohort in Crete, Greece
journal, June 2016

  • Karachaliou, Marianna; Chatzi, Leda; Roumeliotaki, Theano
  • Journal of Child Psychology and Psychiatry, Vol. 57, Issue 11
  • DOI: 10.1111/jcpp.12582

Atropos: specific, sensitive, and speedy trimming of sequencing reads
journal, January 2017

  • Didion, John P.; Martin, Marcel; Collins, Francis S.
  • PeerJ, Vol. 5
  • DOI: 10.7717/peerj.3720

An efficient algorithm for large-scale detection of protein families
journal, April 2002


The Role of the Immune System in Autism Spectrum Disorder
journal, August 2016

  • Meltzer, Amory; Van de Water, Judy
  • Neuropsychopharmacology, Vol. 42, Issue 1
  • DOI: 10.1038/npp.2016.158

Methods for virus classification and the challenge of incorporating metagenomic sequence data
journal, June 2015


Association of autism with polyomavirus infection in postmortem brains
journal, April 2010

  • Lintas, Carla; Altieri, Laura; Lombardi, Federica
  • Journal of Neurovirology, Vol. 16, Issue 2
  • DOI: 10.3109/13550281003685839

Improvements to pairwise sequence comparison (PASC): a genome-based web tool for virus classification
journal, August 2014

  • Bao, Yiming; Chetvernin, Vyacheslav; Tatusova, Tatiana
  • Archives of Virology, Vol. 159, Issue 12
  • DOI: 10.1007/s00705-014-2197-x

Enrichment of Root Endophytic Bacteria from Populus deltoides and Single-Cell-Genomics Analysis
journal, July 2016

  • Utturkar, Sagar M.; Cude, W. Nathan; Robeson, Michael S.
  • Applied and Environmental Microbiology, Vol. 82, Issue 18
  • DOI: 10.1128/AEM.01285-16

Kraken2 Metagenomic Virus Database
dataset, January 2020

  • Garcia, Benjamin; Simha, Ramanuja; Garvin, Michael
  • Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF); Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
  • DOI: 10.13139/olcf/1615774