DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Using populations of human and microbial genomes for organism detection in metagenomes

Abstract

Identifying causative disease agents in human patients from shotgun metagenomic sequencing (SMS) presents a powerful tool to apply when other targeted diagnostics fail. Numerous technical challenges remain, however, before SMS can move beyond the role of research tool. Accurately separating the known and unknown organism content remains difficult, particularly when SMS is applied as a last resort. The true amount of human DNA that remains in a sample after screening against the human reference genome and filtering nonbiological components left from library preparation has previously been underreported. In this study, we create the most comprehensive collection of microbial and reference-free human genetic variation available in a database optimized for efficient metagenomic search by extracting sequences from GenBank and the 1000 Genomes Project. The results reveal new human sequences found in individual Human Microbiome Project (HMP) samples. Individual samples contain up to 95% human sequence, and 4% of the individual HMP samples contain 10% or more human reads. In conclusion, left unidentified, human reads can complicate and slow down further analysis and lead to inaccurately labeled microbial taxa and ultimately lead to privacy concerns as more human genome data is collected.

Authors:
 [1];  [2];  [3];  [4];  [1];  [4]
  1. Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States). Center for Applied Scientific Computing
  2. Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States). Global Security Computer Applications Division
  3. Univ. of Valencia, Valencia (Spain). Inst. de Fisica Corpuscular
  4. Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States). Global Security Computer Applications Division
Publication Date:
Research Org.:
Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1263571
Grant/Contract Number:  
AC52-07NA27344; 33-ER-2012; 08-ER-2011
Resource Type:
Accepted Manuscript
Journal Name:
Genome Research
Additional Journal Information:
Journal Volume: 25; Journal Issue: 7; Journal ID: ISSN 1088-9051
Publisher:
Cold Spring Harbor Laboratory Press
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES

Citation Formats

Ames, Sasha K., Gardner, Shea N., Marti, Jose Manuel, Slezak, Tom R., Gokhale, Maya B., and Allen, Jonathan E. Using populations of human and microbial genomes for organism detection in metagenomes. United States: N. p., 2015. Web. doi:10.1101/gr.184879.114.
Ames, Sasha K., Gardner, Shea N., Marti, Jose Manuel, Slezak, Tom R., Gokhale, Maya B., & Allen, Jonathan E. Using populations of human and microbial genomes for organism detection in metagenomes. United States. https://doi.org/10.1101/gr.184879.114
Ames, Sasha K., Gardner, Shea N., Marti, Jose Manuel, Slezak, Tom R., Gokhale, Maya B., and Allen, Jonathan E. Wed . "Using populations of human and microbial genomes for organism detection in metagenomes". United States. https://doi.org/10.1101/gr.184879.114. https://www.osti.gov/servlets/purl/1263571.
@article{osti_1263571,
title = {Using populations of human and microbial genomes for organism detection in metagenomes},
author = {Ames, Sasha K. and Gardner, Shea N. and Marti, Jose Manuel and Slezak, Tom R. and Gokhale, Maya B. and Allen, Jonathan E.},
abstractNote = {Identifying causative disease agents in human patients from shotgun metagenomic sequencing (SMS) presents a powerful tool to apply when other targeted diagnostics fail. Numerous technical challenges remain, however, before SMS can move beyond the role of research tool. Accurately separating the known and unknown organism content remains difficult, particularly when SMS is applied as a last resort. The true amount of human DNA that remains in a sample after screening against the human reference genome and filtering nonbiological components left from library preparation has previously been underreported. In this study, we create the most comprehensive collection of microbial and reference-free human genetic variation available in a database optimized for efficient metagenomic search by extracting sequences from GenBank and the 1000 Genomes Project. The results reveal new human sequences found in individual Human Microbiome Project (HMP) samples. Individual samples contain up to 95% human sequence, and 4% of the individual HMP samples contain 10% or more human reads. In conclusion, left unidentified, human reads can complicate and slow down further analysis and lead to inaccurately labeled microbial taxa and ultimately lead to privacy concerns as more human genome data is collected.},
doi = {10.1101/gr.184879.114},
journal = {Genome Research},
number = 7,
volume = 25,
place = {United States},
year = {Wed Apr 29 00:00:00 EDT 2015},
month = {Wed Apr 29 00:00:00 EDT 2015}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 24 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes
journal, May 2013

  • Albertsen, Mads; Hugenholtz, Philip; Skarshewski, Adam
  • Nature Biotechnology, Vol. 31, Issue 6
  • DOI: 10.1038/nbt.2579

DNA signatures for detecting genetic engineering in bacteria
journal, January 2008

  • Allen, Jonathan E.; Gardner, Shea N.; Slezak, Tom R.
  • Genome Biology, Vol. 9, Issue 3
  • DOI: 10.1186/gb-2008-9-3-r56

Basic Local Alignment Search Tool
journal, October 1990


Scalable metagenomic taxonomy classification using a reference genome database
journal, July 2013


Design and Optimization of a Metagenomics Analysis Workflow for NVRAM
conference, May 2014

  • Ames, Sasha; Allen, Jonathan E.; Hysom, David A.
  • 2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW)
  • DOI: 10.1109/IPDPSW.2014.200

Comparison of Sequencing Utility Programs
journal, January 2013


GenBank
journal, November 2012

  • Benson, Dennis A.; Cavanaugh, Mark; Clark, Karen
  • Nucleic Acids Research, Vol. 41, Issue D1
  • DOI: 10.1093/nar/gks1195

Rapid phylogenetic and functional classification of short genomic fragments with signature peptides
journal, January 2012

  • Berendzen, Joel; Bruno, William J.; Cohn, Judith D.
  • BMC Research Notes, Vol. 5, Issue 1
  • DOI: 10.1186/1756-0500-5-460

Clinical PathoScope: rapid alignment and filtration for accurate pathogen identification in clinical samples using unassembled sequencing data
journal, January 2014

  • Byrd, Allyson L.; Perez-Rogers, Joseph F.; Manimaran, Solaiappan
  • BMC Bioinformatics, Vol. 15, Issue 1
  • DOI: 10.1186/1471-2105-15-262

Geographic population structure analysis of worldwide human populations infers their biogeographical origins
journal, April 2014

  • Elhaik, Eran; Tatarinova, Tatiana; Chebotarev, Dmitri
  • Nature Communications, Vol. 5, Issue 1
  • DOI: 10.1038/ncomms4513

Bacterial genome sequencing in the clinic: bioinformatic challenges and solutions
journal, November 2013

  • Fricke, W. Florian; Rasko, David A.
  • Nature Reviews Genetics, Vol. 15, Issue 1
  • DOI: 10.1038/nrg3624

CD-HIT: accelerated for clustering the next-generation sequencing data
journal, October 2012


lobSTR: A short tandem repeat profiler for personal genomes
journal, April 2012

  • Gymrek, Melissa; Golan, David; Rosset, Saharon
  • Genome Research, Vol. 22, Issue 6
  • DOI: 10.1101/gr.135780.111

Novel Plasmid and Its Variant Harboring both a bla NDM-1 Gene and Type IV Secretion System in Clinical Isolates of Acinetobacter lwoffii
journal, April 2012

  • Hu, Hongyan; Hu, Yongfei; Pan, Yuanlong
  • Antimicrobial Agents and Chemotherapy, Vol. 56, Issue 4
  • DOI: 10.1128/AAC.06199-11

Short read alignment with populations of genomes
journal, June 2013


Repbase Update, a database of eukaryotic repetitive elements
journal, January 2005

  • Jurka, J.; Kapitonov, V. V.; Pavlicek, A.
  • Cytogenetic and Genome Research, Vol. 110, Issue 1-4
  • DOI: 10.1159/000084979

A Physicians' Wish List for the Clinical Application of Intestinal Metagenomics
journal, April 2014


Mycoplasma contamination in the 1000 Genomes Project
journal, April 2014


Common Contaminants in Next-Generation Sequencing That Hinder Discovery of Low-Abundance Microbes
journal, May 2014


Accurate and fast estimation of taxonomic profiles from metagenomic shotgun sequences
journal, January 2011


A fast, lock-free approach for efficient parallel counting of occurrences of k-mers
journal, January 2011


SIANN: Strain Identification by Alignment to Near Neighbors
posted_content, January 2014

  • Minot, Samuel S.; Turner, Stephen D.; Ternus, Krista L.
  • DOI: 10.1101/001727

A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples
journal, June 2014

  • Naccache, S. N.; Federman, S.; Veeraraghavan, N.
  • Genome Research, Vol. 24, Issue 7
  • DOI: 10.1101/gr.171934.113

Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes
journal, July 2014

  • Nielsen, H. Bjørn; Almeida, Mathieu; Juncker, Agnieszka Sierakowska
  • Nature Biotechnology, Vol. 32, Issue 8
  • DOI: 10.1038/nbt.2939

Derived immune and ancestral pigmentation alleles in a 7,000-year-old Mesolithic European
journal, January 2014

  • Olalde, Iñigo; Allentoft, Morten E.; Sánchez-Quinto, Federico
  • Nature, Vol. 507, Issue 7491
  • DOI: 10.1038/nature12960

Reagent and laboratory contamination can critically impact sequence-based microbiome analyses
journal, November 2014


Fast Identification and Removal of Sequence Contamination from Genomic and Metagenomic Datasets
journal, March 2011


Metagenomic microbial community profiling using unique clade-specific marker genes
journal, June 2012

  • Segata, Nicola; Waldron, Levi; Ballarini, Annalisa
  • Nature Methods, Vol. 9, Issue 8
  • DOI: 10.1038/nmeth.2066

Metagenomic species profiling using universal phylogenetic marker genes
journal, October 2013

  • Sunagawa, Shinichi; Mende, Daniel R.; Zeller, Georg
  • Nature Methods, Vol. 10, Issue 12
  • DOI: 10.1038/nmeth.2693

MePIC, Metagenomic Pathogen Identification for Clinical Specimens
journal, January 2014

  • Takeuchi, Fumihiko; Sekizuka, Tsuyoshi; Yamashita, Akifumi
  • Japanese Journal of Infectious Diseases, Vol. 67, Issue 1
  • DOI: 10.7883/yoken.67.62

Strain/species identification in metagenomes using genome-specific markers
journal, February 2014

  • Tu, Qichao; He, Zhili; Zhou, Jizhong
  • Nucleic Acids Research, Vol. 42, Issue 8
  • DOI: 10.1093/nar/gku138

DI-MMAP: A High Performance Memory-Map Runtime for Data-Intensive Applications
conference, November 2012

  • Van Essen, Brian; Hsieh, Henry; Ames, Sasha
  • 2012 SC Companion: High Performance Computing, Networking Storage and Analysis
  • DOI: 10.1109/SC.Companion.2012.99

The landscape of human STR variation
journal, August 2014

  • Willems, Thomas; Gymrek, Melissa; Highnam, Gareth
  • Genome Research, Vol. 24, Issue 11
  • DOI: 10.1101/gr.177774.114

Kraken: ultrafast metagenomic sequence classification using exact alignments
journal, January 2014


Virus Identification in Unknown Tropical Febrile Illness Cases Using Deep Sequencing
journal, February 2012

  • Yozwiak, Nathan L.; Skewes-Cox, Peter; Stenglein, Mark D.
  • PLoS Neglected Tropical Diseases, Vol. 6, Issue 2
  • DOI: 10.1371/journal.pntd.0001485

RAPSearch2: a fast and memory-efficient protein similarity search tool for next-generation sequencing data
journal, October 2011


Identification of Novel Viruses Using VirusHunter -- an Automated Data Analysis Pipeline
journal, October 2013


Works referencing / citing this record:

Recentrifuge: Robust comparative analysis and contamination removal for metagenomics
journal, April 2019


Host genetic variation impacts microbiome composition across human body sites
journal, September 2015


Host‐derived population genomics data provides insights into bacterial and diatom composition of the killer whale skin
journal, October 2018

  • Hooper, Rebecca; Brealey, Jaelle C.; Valk, Tom
  • Molecular Ecology, Vol. 28, Issue 2
  • DOI: 10.1111/mec.14860

Host genetic variation impacts microbiome composition across human body sites
journal, September 2015


A novel variant of torque teno virus 7 identified in patients with Kawasaki disease
journal, December 2018


A novel variant of torque teno virus 7 identified in patients with Kawasaki disease
journal, December 2018


Re-analysis of metagenomic sequences from acute flaccid myelitis patients reveals alternatives to enterovirus D68 infection
journal, July 2015


Embracing the gut microbiota: the new frontier for inflammatory and infectious diseases
journal, January 2017

  • van den Elsen, Lieke WJ; Poyntz, Hazel C.; Weyrich, Laura S.
  • Clinical & Translational Immunology, Vol. 6, Issue 1
  • DOI: 10.1038/cti.2016.91

Host-derived population genomics data provides insights into bacterial and diatom composition of the killer whale skin
posted_content, July 2018

  • Hooper, Rebecca; Brealey, Jaelle C.; van der Valk, Tom
  • Molecular Ecology
  • DOI: 10.1101/282038

Gut microbiota composition and functional changes in inflammatory bowel disease and irritable bowel syndrome
journal, December 2018


Health and Disease Imprinted in the Time Variability of the Human Microbiome
journal, March 2017