Using populations of human and microbial genomes for organism detection in metagenomes
Abstract
Identifying causative disease agents in human patients from shotgun metagenomic sequencing (SMS) presents a powerful tool to apply when other targeted diagnostics fail. Numerous technical challenges remain, however, before SMS can move beyond the role of research tool. Accurately separating the known and unknown organism content remains difficult, particularly when SMS is applied as a last resort. The true amount of human DNA that remains in a sample after screening against the human reference genome and filtering nonbiological components left from library preparation has previously been underreported. In this study, we create the most comprehensive collection of microbial and reference-free human genetic variation available in a database optimized for efficient metagenomic search by extracting sequences from GenBank and the 1000 Genomes Project. The results reveal new human sequences found in individual Human Microbiome Project (HMP) samples. Individual samples contain up to 95% human sequence, and 4% of the individual HMP samples contain 10% or more human reads. In conclusion, left unidentified, human reads can complicate and slow down further analysis and lead to inaccurately labeled microbial taxa and ultimately lead to privacy concerns as more human genome data is collected.
- Authors:
-
- Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States). Center for Applied Scientific Computing
- Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States). Global Security Computer Applications Division
- Univ. of Valencia, Valencia (Spain). Inst. de Fisica Corpuscular
- Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States). Global Security Computer Applications Division
- Publication Date:
- Research Org.:
- Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 1263571
- Grant/Contract Number:
- AC52-07NA27344; 33-ER-2012; 08-ER-2011
- Resource Type:
- Accepted Manuscript
- Journal Name:
- Genome Research
- Additional Journal Information:
- Journal Volume: 25; Journal Issue: 7; Journal ID: ISSN 1088-9051
- Publisher:
- Cold Spring Harbor Laboratory Press
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 59 BASIC BIOLOGICAL SCIENCES
Citation Formats
Ames, Sasha K., Gardner, Shea N., Marti, Jose Manuel, Slezak, Tom R., Gokhale, Maya B., and Allen, Jonathan E. Using populations of human and microbial genomes for organism detection in metagenomes. United States: N. p., 2015.
Web. doi:10.1101/gr.184879.114.
Ames, Sasha K., Gardner, Shea N., Marti, Jose Manuel, Slezak, Tom R., Gokhale, Maya B., & Allen, Jonathan E. Using populations of human and microbial genomes for organism detection in metagenomes. United States. https://doi.org/10.1101/gr.184879.114
Ames, Sasha K., Gardner, Shea N., Marti, Jose Manuel, Slezak, Tom R., Gokhale, Maya B., and Allen, Jonathan E. Wed .
"Using populations of human and microbial genomes for organism detection in metagenomes". United States. https://doi.org/10.1101/gr.184879.114. https://www.osti.gov/servlets/purl/1263571.
@article{osti_1263571,
title = {Using populations of human and microbial genomes for organism detection in metagenomes},
author = {Ames, Sasha K. and Gardner, Shea N. and Marti, Jose Manuel and Slezak, Tom R. and Gokhale, Maya B. and Allen, Jonathan E.},
abstractNote = {Identifying causative disease agents in human patients from shotgun metagenomic sequencing (SMS) presents a powerful tool to apply when other targeted diagnostics fail. Numerous technical challenges remain, however, before SMS can move beyond the role of research tool. Accurately separating the known and unknown organism content remains difficult, particularly when SMS is applied as a last resort. The true amount of human DNA that remains in a sample after screening against the human reference genome and filtering nonbiological components left from library preparation has previously been underreported. In this study, we create the most comprehensive collection of microbial and reference-free human genetic variation available in a database optimized for efficient metagenomic search by extracting sequences from GenBank and the 1000 Genomes Project. The results reveal new human sequences found in individual Human Microbiome Project (HMP) samples. Individual samples contain up to 95% human sequence, and 4% of the individual HMP samples contain 10% or more human reads. In conclusion, left unidentified, human reads can complicate and slow down further analysis and lead to inaccurately labeled microbial taxa and ultimately lead to privacy concerns as more human genome data is collected.},
doi = {10.1101/gr.184879.114},
journal = {Genome Research},
number = 7,
volume = 25,
place = {United States},
year = {Wed Apr 29 00:00:00 EDT 2015},
month = {Wed Apr 29 00:00:00 EDT 2015}
}
Web of Science
Works referenced in this record:
Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes
journal, May 2013
- Albertsen, Mads; Hugenholtz, Philip; Skarshewski, Adam
- Nature Biotechnology, Vol. 31, Issue 6
DNA signatures for detecting genetic engineering in bacteria
journal, January 2008
- Allen, Jonathan E.; Gardner, Shea N.; Slezak, Tom R.
- Genome Biology, Vol. 9, Issue 3
Basic Local Alignment Search Tool
journal, October 1990
- Altschul, S.
- Journal of Molecular Biology, Vol. 215, Issue 3
Scalable metagenomic taxonomy classification using a reference genome database
journal, July 2013
- Ames, Sasha K.; Hysom, David A.; Gardner, Shea N.
- Bioinformatics, Vol. 29, Issue 18
Design and Optimization of a Metagenomics Analysis Workflow for NVRAM
conference, May 2014
- Ames, Sasha; Allen, Jonathan E.; Hysom, David A.
- 2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW)
Comparison of Sequencing Utility Programs
journal, January 2013
- Aronesty, Erik
- The Open Bioinformatics Journal, Vol. 7, Issue 1
GenBank
journal, November 2012
- Benson, Dennis A.; Cavanaugh, Mark; Clark, Karen
- Nucleic Acids Research, Vol. 41, Issue D1
Rapid phylogenetic and functional classification of short genomic fragments with signature peptides
journal, January 2012
- Berendzen, Joel; Bruno, William J.; Cohn, Judith D.
- BMC Research Notes, Vol. 5, Issue 1
Clinical PathoScope: rapid alignment and filtration for accurate pathogen identification in clinical samples using unassembled sequencing data
journal, January 2014
- Byrd, Allyson L.; Perez-Rogers, Joseph F.; Manimaran, Solaiappan
- BMC Bioinformatics, Vol. 15, Issue 1
Full Genome Virus Detection in Fecal Samples Using Sensitive Nucleic Acid Preparation, Deep Sequencing, and a Novel Iterative Sequence Classification Algorithm
journal, April 2014
- Cotten, Matthew; Oude Munnink, Bas; Canuti, Marta
- PLoS ONE, Vol. 9, Issue 4
Geographic population structure analysis of worldwide human populations infers their biogeographical origins
journal, April 2014
- Elhaik, Eran; Tatarinova, Tatiana; Chebotarev, Dmitri
- Nature Communications, Vol. 5, Issue 1
Bacterial genome sequencing in the clinic: bioinformatic challenges and solutions
journal, November 2013
- Fricke, W. Florian; Rasko, David A.
- Nature Reviews Genetics, Vol. 15, Issue 1
CD-HIT: accelerated for clustering the next-generation sequencing data
journal, October 2012
- Fu, Limin; Niu, Beifang; Zhu, Zhengwei
- Bioinformatics, Vol. 28, Issue 23
lobSTR: A short tandem repeat profiler for personal genomes
journal, April 2012
- Gymrek, Melissa; Golan, David; Rosset, Saharon
- Genome Research, Vol. 22, Issue 6
Novel Plasmid and Its Variant Harboring both a
bla
NDM-1
Gene and Type IV Secretion System in Clinical Isolates of Acinetobacter lwoffii
journal, April 2012
- Hu, Hongyan; Hu, Yongfei; Pan, Yuanlong
- Antimicrobial Agents and Chemotherapy, Vol. 56, Issue 4
Short read alignment with populations of genomes
journal, June 2013
- Huang, L.; Popic, V.; Batzoglou, S.
- Bioinformatics, Vol. 29, Issue 13
Repbase Update, a database of eukaryotic repetitive elements
journal, January 2005
- Jurka, J.; Kapitonov, V. V.; Pavlicek, A.
- Cytogenetic and Genome Research, Vol. 110, Issue 1-4
A Physicians' Wish List for the Clinical Application of Intestinal Metagenomics
journal, April 2014
- Klymiuk, Ingeborg; Högenauer, Christoph; Halwachs, Bettina
- PLoS Medicine, Vol. 11, Issue 4
Mycoplasma contamination in the 1000 Genomes Project
journal, April 2014
- Langdon, William B.
- BioData Mining, Vol. 7, Issue 1
Common Contaminants in Next-Generation Sequencing That Hinder Discovery of Low-Abundance Microbes
journal, May 2014
- Laurence, Martin; Hatzis, Christos; Brash, Douglas E.
- PLoS ONE, Vol. 9, Issue 5
Accurate and fast estimation of taxonomic profiles from metagenomic shotgun sequences
journal, January 2011
- Liu, Bo; Gibbons, Theodore; Ghodsi, Mohammad
- BMC Genomics, Vol. 12, Issue Suppl 2
A fast, lock-free approach for efficient parallel counting of occurrences of k-mers
journal, January 2011
- Marçais, Guillaume; Kingsford, Carl
- Bioinformatics, Vol. 27, Issue 6
Optimizing Read Mapping to Reference Genomes to Determine Composition and Species Prevalence in Microbial Communities
journal, June 2012
- Martin, John; Sykes, Sean; Young, Sarah
- PLoS ONE, Vol. 7, Issue 6
SIANN: Strain Identification by Alignment to Near Neighbors
posted_content, January 2014
- Minot, Samuel S.; Turner, Stephen D.; Ternus, Krista L.
A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples
journal, June 2014
- Naccache, S. N.; Federman, S.; Veeraraghavan, N.
- Genome Research, Vol. 24, Issue 7
Direct Metagenomic Detection of Viral Pathogens in Nasal and Fecal Specimens Using an Unbiased High-Throughput Sequencing Approach
journal, January 2009
- Nakamura, Shota; Yang, Cheng-Song; Sakon, Naomi
- PLoS ONE, Vol. 4, Issue 1
Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes
journal, July 2014
- Nielsen, H. Bjørn; Almeida, Mathieu; Juncker, Agnieszka Sierakowska
- Nature Biotechnology, Vol. 32, Issue 8
Derived immune and ancestral pigmentation alleles in a 7,000-year-old Mesolithic European
journal, January 2014
- Olalde, Iñigo; Allentoft, Morten E.; Sánchez-Quinto, Federico
- Nature, Vol. 507, Issue 7491
Reagent and laboratory contamination can critically impact sequence-based microbiome analyses
journal, November 2014
- Salter, Susannah J.; Cox, Michael J.; Turek, Elena M.
- BMC Biology, Vol. 12, Issue 1
Fast Identification and Removal of Sequence Contamination from Genomic and Metagenomic Datasets
journal, March 2011
- Schmieder, Robert; Edwards, Robert
- PLoS ONE, Vol. 6, Issue 3
Metagenomic microbial community profiling using unique clade-specific marker genes
journal, June 2012
- Segata, Nicola; Waldron, Levi; Ballarini, Annalisa
- Nature Methods, Vol. 9, Issue 8
Metagenomic species profiling using universal phylogenetic marker genes
journal, October 2013
- Sunagawa, Shinichi; Mende, Daniel R.; Zeller, Georg
- Nature Methods, Vol. 10, Issue 12
MePIC, Metagenomic Pathogen Identification for Clinical Specimens
journal, January 2014
- Takeuchi, Fumihiko; Sekizuka, Tsuyoshi; Yamashita, Akifumi
- Japanese Journal of Infectious Diseases, Vol. 67, Issue 1
Strain/species identification in metagenomes using genome-specific markers
journal, February 2014
- Tu, Qichao; He, Zhili; Zhou, Jizhong
- Nucleic Acids Research, Vol. 42, Issue 8
DI-MMAP: A High Performance Memory-Map Runtime for Data-Intensive Applications
conference, November 2012
- Van Essen, Brian; Hsieh, Henry; Ames, Sasha
- 2012 SC Companion: High Performance Computing, Networking Storage and Analysis
The landscape of human STR variation
journal, August 2014
- Willems, Thomas; Gymrek, Melissa; Highnam, Gareth
- Genome Research, Vol. 24, Issue 11
Kraken: ultrafast metagenomic sequence classification using exact alignments
journal, January 2014
- Wood, Derrick E.; Salzberg, Steven L.
- Genome Biology, Vol. 15, Issue 3
Virus Identification in Unknown Tropical Febrile Illness Cases Using Deep Sequencing
journal, February 2012
- Yozwiak, Nathan L.; Skewes-Cox, Peter; Stenglein, Mark D.
- PLoS Neglected Tropical Diseases, Vol. 6, Issue 2
RAPSearch2: a fast and memory-efficient protein similarity search tool for next-generation sequencing data
journal, October 2011
- Zhao, Y.; Tang, H.; Ye, Y.
- Bioinformatics, Vol. 28, Issue 1
Identification of Novel Viruses Using VirusHunter -- an Automated Data Analysis Pipeline
journal, October 2013
- Zhao, Guoyan; Krishnamurthy, Siddharth; Cai, Zhengqiu
- PLoS ONE, Vol. 8, Issue 10
Works referencing / citing this record:
Recentrifuge: Robust comparative analysis and contamination removal for metagenomics
journal, April 2019
- Martí, Jose Manuel
- PLOS Computational Biology, Vol. 15, Issue 4
Host genetic variation impacts microbiome composition across human body sites
journal, September 2015
- Blekhman, Ran; Goodrich, Julia K.; Huang, Katherine
- Genome Biology, Vol. 16, Issue 1
Host‐derived population genomics data provides insights into bacterial and diatom composition of the killer whale skin
journal, October 2018
- Hooper, Rebecca; Brealey, Jaelle C.; Valk, Tom
- Molecular Ecology, Vol. 28, Issue 2
Host genetic variation impacts microbiome composition across human body sites
journal, September 2015
- Blekhman, Ran; Goodrich, Julia K.; Huang, Katherine
- Genome Biology, Vol. 16, Issue 1
A novel variant of torque teno virus 7 identified in patients with Kawasaki disease
journal, December 2018
- Thissen, James B.; Isshiki, Mariko; Jaing, Crystal
- PLOS ONE, Vol. 13, Issue 12
A novel variant of torque teno virus 7 identified in patients with Kawasaki disease
journal, December 2018
- Thissen, James B.; Isshiki, Mariko; Jaing, Crystal
- PLOS ONE, Vol. 13, Issue 12
Re-analysis of metagenomic sequences from acute flaccid myelitis patients reveals alternatives to enterovirus D68 infection
journal, July 2015
- Breitwieser, Florian P.; Pardo, Carlos A.; Salzberg, Steven L.
- F1000Research, Vol. 4
Embracing the gut microbiota: the new frontier for inflammatory and infectious diseases
journal, January 2017
- van den Elsen, Lieke WJ; Poyntz, Hazel C.; Weyrich, Laura S.
- Clinical & Translational Immunology, Vol. 6, Issue 1
Host-derived population genomics data provides insights into bacterial and diatom composition of the killer whale skin
posted_content, July 2018
- Hooper, Rebecca; Brealey, Jaelle C.; van der Valk, Tom
- Molecular Ecology
Gut microbiota composition and functional changes in inflammatory bowel disease and irritable bowel syndrome
journal, December 2018
- Vich Vila, Arnau; Imhann, Floris; Collij, Valerie
- Science Translational Medicine, Vol. 10, Issue 472
Health and Disease Imprinted in the Time Variability of the Human Microbiome
journal, March 2017
- Martí, Jose Manuel; Martínez-Martínez, Daniel; Rubio, Teresa
- mSystems, Vol. 2, Issue 2