Viral phylogenomics using an alignment-free method: A three-step approach to determine optimal length of k-mer
- Univ. of Tennessee, Knoxville, TN (United States); Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Univ. of Arkansas for Medical Sciences, Little Rock, AR (United States)
The development of rapid, economical genome sequencing has shed new light on the classification of viruses. As of October 2016, the National Center for Biotechnology Information (NCBI) database contained >2 million viral genome sequences and a reference set of ~4000 viral genome sequences that cover a wide range of known viral families. Whole-genome sequences can be used to improve viral classification and provide insight into the viral tree of life . However, due to the lack of evolutionary conservation amongst diverse viruses, it is not feasible to build a viral tree of life using traditional phylogenetic methods based on conserved proteins. In this study, we used an alignment-free method that uses k-mers as genomic features for a large-scale comparison of complete viral genomes available in RefSeq. To determine the optimal feature length, k (an essential step in constructing a meaningful dendrogram), we designed a comprehensive strategy that combines three approaches: (1) cumulative relative entropy, (2) average number of common features among genomes, and (3) the Shannon diversity index. This strategy was used to determine k for all 3,905 complete viral genomes in RefSeq. Lastly, the resulting dendrogram shows consistency with the viral taxonomy of the ICTV and the Baltimore classification of viruses.
- Research Organization:
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)
- Sponsoring Organization:
- USDOE Office of Science (SC)
- Grant/Contract Number:
- AC05-00OR22725
- OSTI ID:
- 1351783
- Journal Information:
- Scientific Reports, Vol. 7; ISSN 2045-2322
- Publisher:
- Nature Publishing GroupCopyright Statement
- Country of Publication:
- United States
- Language:
- English
Web of Science
Defining a Core Genome for the Herpesvirales and Exploring their Evolutionary Relationship with the Caudovirales
|
journal | August 2019 |
Comparison of Compression-Based Measures with Application to the Evolution of Primate Genomes
|
journal | May 2018 |
Metagenomic Composition Analysis of an Ancient Sequenced Polar Bear Jawbone from Svalbard
|
journal | September 2018 |
SSAW: A new sequence similarity analysis method based on the stationary discrete wavelet transform
|
journal | May 2018 |
Similar Records
Kraken2 Metagenomic Virus Database
Expanding standards in viromics: in silico evaluation of dsDNA viral genome identification, classification, and auxiliary metabolic gene curation