skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Nontargeted virus sequence discovery pipeline and virus clustering for metagenomic data

Abstract

The analysis of large microbiome data sets holds great promise for the delineation of the biological and metabolic functioning of living organisms and their role in the environment. In the midst of this genomic puzzle, viruses, especially those that infect microbial communities, represent a major reservoir of genetic diversity with great impact on biogeochemical cycles and organismal health. Overcoming the limitations associated with virus detection directly from microbiomes can provide key insights into how ecosystem dynamics are modulated. Here, we present a computational protocol for accurate detection and grouping of viral sequences from microbiome samples. Our approach relies on an expanded and curated set of viral protein families used as bait to identify viral sequences directly from metagenomic assemblies. This protocol describes how to use the viral protein families catalog (~7 h) and recommended filters for the detection of viral contigs in metagenomic samples (~6 h), and it describes the specific parameters for a nucleotide-sequence-identity-based method of organizing the viral sequences into quasi-species taxonomic-level groups (~10 min).

Authors:
ORCiD logo [1];  [1];  [1];  [1]
  1. USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States)
Publication Date:
Research Org.:
Lawrence Berkeley National Laboratory-National Energy Research Scientific Computing Center
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
1489397
DOE Contract Number:  
AC02-05CH11231
Resource Type:
Journal Article
Journal Name:
Nature Protocols
Additional Journal Information:
Journal Volume: 12; Journal Issue: 8; Journal ID: ISSN 1754-2189
Publisher:
Nature Publishing Group
Country of Publication:
United States
Language:
English

Citation Formats

Paez-Espino, David, Pavlopoulos, Georgios A., Ivanova, Natalia N., and Kyrpides, Nikos C. Nontargeted virus sequence discovery pipeline and virus clustering for metagenomic data. United States: N. p., 2017. Web. doi:10.1038/nprot.2017.063.
Paez-Espino, David, Pavlopoulos, Georgios A., Ivanova, Natalia N., & Kyrpides, Nikos C. Nontargeted virus sequence discovery pipeline and virus clustering for metagenomic data. United States. doi:10.1038/nprot.2017.063.
Paez-Espino, David, Pavlopoulos, Georgios A., Ivanova, Natalia N., and Kyrpides, Nikos C. Thu . "Nontargeted virus sequence discovery pipeline and virus clustering for metagenomic data". United States. doi:10.1038/nprot.2017.063.
@article{osti_1489397,
title = {Nontargeted virus sequence discovery pipeline and virus clustering for metagenomic data},
author = {Paez-Espino, David and Pavlopoulos, Georgios A. and Ivanova, Natalia N. and Kyrpides, Nikos C.},
abstractNote = {The analysis of large microbiome data sets holds great promise for the delineation of the biological and metabolic functioning of living organisms and their role in the environment. In the midst of this genomic puzzle, viruses, especially those that infect microbial communities, represent a major reservoir of genetic diversity with great impact on biogeochemical cycles and organismal health. Overcoming the limitations associated with virus detection directly from microbiomes can provide key insights into how ecosystem dynamics are modulated. Here, we present a computational protocol for accurate detection and grouping of viral sequences from microbiome samples. Our approach relies on an expanded and curated set of viral protein families used as bait to identify viral sequences directly from metagenomic assemblies. This protocol describes how to use the viral protein families catalog (~7 h) and recommended filters for the detection of viral contigs in metagenomic samples (~6 h), and it describes the specific parameters for a nucleotide-sequence-identity-based method of organizing the viral sequences into quasi-species taxonomic-level groups (~10 min).},
doi = {10.1038/nprot.2017.063},
journal = {Nature Protocols},
issn = {1754-2189},
number = 8,
volume = 12,
place = {United States},
year = {2017},
month = {7}
}

Works referenced in this record:

Uncovering Earth’s virome
journal, August 2016

  • Paez-Espino, David; Eloe-Fadrosh, Emiley A.; Pavlopoulos, Georgios A.
  • Nature, Vol. 536, Issue 7617
  • DOI: 10.1038/nature19094

IMG/VR: a database of cultured and uncultured DNA Viruses and retroviruses
journal, October 2016

  • Paez-Espino, David; Chen, I. -Min A.; Palaniappan, Krishna
  • Nucleic Acids Research, Vol. 45, Issue D1
  • DOI: 10.1093/nar/gkw1030

The Marine Viromes of Four Oceanic Regions
journal, November 2006


Functional metagenomic profiling of nine biomes
journal, March 2008

  • Dinsdale, Elizabeth A.; Edwards, Robert A.; Hall, Dana
  • Nature, Vol. 452, Issue 7187
  • DOI: 10.1038/nature06810

Phage_Finder: Automated identification and classification of prophage regions in complete bacterial genome sequences
journal, October 2006


MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform
journal, July 2002


Genomes OnLine Database (GOLD) v.6: data updates and feature enhancements
journal, October 2016

  • Mukherjee, Supratim; Stamatis, Dimitri; Bertsch, Jon
  • Nucleic Acids Research, Vol. 45, Issue D1
  • DOI: 10.1093/nar/gkw992

The iPlant Collaborative: Cyberinfrastructure for Enabling Data to Discovery for the Life Sciences
journal, January 2016


PhiSpy: a novel algorithm for finding prophages in bacterial genomes that combines similarity- and composition-based strategies
journal, May 2012

  • Akhter, Sajia; Aziz, Ramy K.; Edwards, Robert A.
  • Nucleic Acids Research, Vol. 40, Issue 16
  • DOI: 10.1093/nar/gks406

A century of the phage: past, present and future
journal, November 2015

  • Salmond, George P. C.; Fineran, Peter C.
  • Nature Reviews Microbiology, Vol. 13, Issue 12
  • DOI: 10.1038/nrmicro3564

SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing
journal, May 2012

  • Bankevich, Anton; Nurk, Sergey; Antipov, Dmitry
  • Journal of Computational Biology, Vol. 19, Issue 5
  • DOI: 10.1089/cmb.2012.0021

Global distribution of nearly identical phage-encoded DNA sequences
journal, July 2004


Prokaryotic Virus Orthologous Groups (pVOGs): a resource for comparative genomics and protein family annotation
journal, October 2016

  • Grazziotin, Ana Laura; Koonin, Eugene V.; Kristensen, David M.
  • Nucleic Acids Research, Vol. 45, Issue D1
  • DOI: 10.1093/nar/gkw975

HostPhinder: A Phage Host Prediction Tool
journal, May 2016

  • Villarroel, Julia; Kleinheinz, Kortine; Jurtz, Vanessa
  • Viruses, Vol. 8, Issue 5
  • DOI: 10.3390/v8050116

Fast and accurate short read alignment with Burrows-Wheeler transform
journal, May 2009


Dendroscope 3: An Interactive Tool for Rooted Phylogenetic Trees and Networks
journal, September 2012


Prophinder: a computational tool for prophage prediction in prokaryotic genomes
journal, January 2008


A highly abundant bacteriophage discovered in the unknown sequences of human faecal metagenomes
journal, July 2014

  • Dutilh, Bas E.; Cassman, Noriko; McNair, Katelyn
  • Nature Communications, Vol. 5, Issue 1
  • DOI: 10.1038/ncomms5498

BLAST+: architecture and applications
journal, January 2009

  • Camacho, Christiam; Coulouris, George; Avagyan, Vahram
  • BMC Bioinformatics, Vol. 10, Issue 1
  • DOI: 10.1186/1471-2105-10-421

Community-wide analysis of microbial genome sequence signatures
journal, January 2009

  • Dick, Gregory J.; Andersson, Anders F.; Baker, Brett J.
  • Genome Biology, Vol. 10, Issue 8
  • DOI: 10.1186/gb-2009-10-8-r85

Programming Bacteriophages by Swapping Their Specificity Determinants
journal, December 2015


Patterns and ecological drivers of ocean viral communities
journal, May 2015


MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph
journal, January 2015


Marine viruses — major players in the global ecosystem
journal, October 2007


PHASTER: a better, faster version of the PHAST phage search tool
journal, May 2016

  • Arndt, David; Grant, Jason R.; Marcu, Ana
  • Nucleic Acids Research, Vol. 44, Issue W1
  • DOI: 10.1093/nar/gkw387

HMMER web server: 2015 update
journal, May 2015

  • Finn, Robert D.; Clements, Jody; Arndt, William
  • Nucleic Acids Research, Vol. 43, Issue W1
  • DOI: 10.1093/nar/gkv397

Ecogenomics and potential biogeochemical impacts of globally abundant ocean viruses
journal, September 2016

  • Roux, Simon; Brum, Jennifer R.; Dutilh, Bas E.
  • Nature, Vol. 537, Issue 7622
  • DOI: 10.1038/nature19366

A call for standardized classification of metagenome projects: Genomics update
journal, March 2010


Determination of viral production in aquatic sediments using the dilution-based approach
journal, June 2009

  • Dell'Anno, Antonio; Corinaldesi, Cinzia; Magagnini, Mirko
  • Nature Protocols, Vol. 4, Issue 7
  • DOI: 10.1038/nprot.2009.82

Computational approaches to predict bacteriophage–host relationships
journal, December 2015

  • Edwards, Robert A.; McNair, Katelyn; Faust, Karoline
  • FEMS Microbiology Reviews, Vol. 40, Issue 2
  • DOI: 10.1093/femsre/fuv048

Prodigal: prokaryotic gene recognition and translation initiation site identification
journal, March 2010


VirSorter: mining viral signal from microbial genomic data
journal, January 2015

  • Roux, Simon; Enault, Francois; Hurwitz, Bonnie L.
  • PeerJ, Vol. 3
  • DOI: 10.7717/peerj.985

An efficient algorithm for large-scale detection of protein families
journal, April 2002


HMMER web server: interactive sequence similarity searching
journal, May 2011

  • Finn, R. D.; Clements, J.; Eddy, S. R.
  • Nucleic Acids Research, Vol. 39, Issue suppl
  • DOI: 10.1093/nar/gkr367

FastTree: Computing Large Minimum Evolution Trees with Profiles instead of a Distance Matrix
journal, April 2009

  • Price, M. N.; Dehal, P. S.; Arkin, A. P.
  • Molecular Biology and Evolution, Vol. 26, Issue 7
  • DOI: 10.1093/molbev/msp077

IMG/M: integrated genome and metagenome comparative data analysis system
journal, October 2016

  • Chen, I-Min A.; Markowitz, Victor M.; Chu, Ken
  • Nucleic Acids Research, Vol. 45, Issue D1
  • DOI: 10.1093/nar/gkw929

Laboratory procedures to generate viral metagenomes
journal, March 2009

  • Thurber, Rebecca V.; Haynes, Matthew; Breitbart, Mya
  • Nature Protocols, Vol. 4, Issue 4
  • DOI: 10.1038/nprot.2009.10

Expanding the Marine Virosphere Using Metagenomics
journal, December 2013


Search and clustering orders of magnitude faster than BLAST
journal, August 2010


Viral communities associated with healthy and bleaching corals
journal, September 2008


Here a virus, there a virus, everywhere the same virus?
journal, June 2005