skip to main content

DOE PAGESDOE PAGES

Title: Strategies to improve reference databases for soil microbiomes

A database of curated genomes is needed to better assess soil microbial communities and their processes associated with differing land management and environmental impacts. Interpreting soil metagenomic datasets with existing sequence databases is challenging because these datasets are biased towards medical and biotechnology research and can result in misleading annotations. We have curated a database of 928 genomes of soil-associated organisms (888 bacteria, 34 archaea, and 6 fungi). Using this database as a representation of the current state of knowledge of soil microbes that are well-characterized, we evaluated its composition and compared it to broader microbial databases, specifically NCBI’s RefSeq, as well as 3,035 publicly available soil amplicon datasets. These comparisons identified phyla and functions that are enriched in soils as well as those that may be underrepresented in RefSoil. For example, RefSoil was observed to have increased representation of Firmicutes despite its low abundance in soil environments and also lacked representation of Acidobacteria and Verrucomicrobia, which are abundant in soils. Our comparison of RefSoil to soil amplicon datasets allowed us to identify targets that if cultured or sequenced would significantly increase the biodiversity represented within RefSoil. To demonstrate the opportunities to access these underrepresented targets, we employed single cellmore » genomics in a pilot experiment to recover 14 genomes from the "most wanted" list, which improved RefSoil's representation of EMP sequences by 7% by abundance. This effort demonstrates the value of RefSoil in the guidance of future research efforts and the capability of single cell genomics as a practical means to fill the existing genomic data gaps.« less
Authors:
ORCiD logo [1] ;  [1] ; ORCiD logo [2] ; ORCiD logo [3] ;  [4] ;  [1] ;  [1] ;  [4] ; ORCiD logo [5] ;  [1] ;  [1]
  1. Iowa State Univ., Ames, IA (United States). Dept. of Agricultural and Biosystems Engineering
  2. Bigelow Lab. for Ocean Sciences, East Boothbay, ME (United States)
  3. Univ. of British Columbia, Vancouver, BC (Canada). Dept. of Microbiology & Immunology
  4. Michigan State Univ., East Lansing, MI (United States). Center for Microbial Ecology
  5. Pacific Northwest National Lab. (PNNL), Richland, WA (United States). Environmental Molecular Sciences Lab. (EMSL); Iowa State Univ., Ames, IA (United States). Dept. of Ecology, Evolution and Organismal Biology
Publication Date:
Report Number(s):
PNNL-SA-122172
Journal ID: ISSN 1751-7362
Grant/Contract Number:
AC05-76RL01830; SC0010775
Type:
Accepted Manuscript
Journal Name:
The ISME Journal
Additional Journal Information:
Journal Volume: 11; Journal Issue: 4; Journal ID: ISSN 1751-7362
Publisher:
Nature Publishing Group
Research Org:
Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Sponsoring Org:
USDOE Office of Science (SC), Biological and Environmental Research (BER) (SC-23); National Science Foundation (NSF)
Country of Publication:
United States
Language:
English
Subject:
96 KNOWLEDGE MANAGEMENT AND PRESERVATION; 59 BASIC BIOLOGICAL SCIENCES
OSTI Identifier:
1353315

Choi, Jinlyung, Yang, Fan, Stepanauskas, Ramunas, Cardenas, Erick, Garoutte, Aaron, Williams, Ryan, Flater, Jared, Tiedje, James M., Hofmockel, Kirsten S., Gelder, Brian, and Howe, Adina. Strategies to improve reference databases for soil microbiomes. United States: N. p., Web. doi:10.1038/ismej.2016.168.
Choi, Jinlyung, Yang, Fan, Stepanauskas, Ramunas, Cardenas, Erick, Garoutte, Aaron, Williams, Ryan, Flater, Jared, Tiedje, James M., Hofmockel, Kirsten S., Gelder, Brian, & Howe, Adina. Strategies to improve reference databases for soil microbiomes. United States. doi:10.1038/ismej.2016.168.
Choi, Jinlyung, Yang, Fan, Stepanauskas, Ramunas, Cardenas, Erick, Garoutte, Aaron, Williams, Ryan, Flater, Jared, Tiedje, James M., Hofmockel, Kirsten S., Gelder, Brian, and Howe, Adina. 2016. "Strategies to improve reference databases for soil microbiomes". United States. doi:10.1038/ismej.2016.168. https://www.osti.gov/servlets/purl/1353315.
@article{osti_1353315,
title = {Strategies to improve reference databases for soil microbiomes},
author = {Choi, Jinlyung and Yang, Fan and Stepanauskas, Ramunas and Cardenas, Erick and Garoutte, Aaron and Williams, Ryan and Flater, Jared and Tiedje, James M. and Hofmockel, Kirsten S. and Gelder, Brian and Howe, Adina},
abstractNote = {A database of curated genomes is needed to better assess soil microbial communities and their processes associated with differing land management and environmental impacts. Interpreting soil metagenomic datasets with existing sequence databases is challenging because these datasets are biased towards medical and biotechnology research and can result in misleading annotations. We have curated a database of 928 genomes of soil-associated organisms (888 bacteria, 34 archaea, and 6 fungi). Using this database as a representation of the current state of knowledge of soil microbes that are well-characterized, we evaluated its composition and compared it to broader microbial databases, specifically NCBI’s RefSeq, as well as 3,035 publicly available soil amplicon datasets. These comparisons identified phyla and functions that are enriched in soils as well as those that may be underrepresented in RefSoil. For example, RefSoil was observed to have increased representation of Firmicutes despite its low abundance in soil environments and also lacked representation of Acidobacteria and Verrucomicrobia, which are abundant in soils. Our comparison of RefSoil to soil amplicon datasets allowed us to identify targets that if cultured or sequenced would significantly increase the biodiversity represented within RefSoil. To demonstrate the opportunities to access these underrepresented targets, we employed single cell genomics in a pilot experiment to recover 14 genomes from the "most wanted" list, which improved RefSoil's representation of EMP sequences by 7% by abundance. This effort demonstrates the value of RefSoil in the guidance of future research efforts and the capability of single cell genomics as a practical means to fill the existing genomic data gaps.},
doi = {10.1038/ismej.2016.168},
journal = {The ISME Journal},
number = 4,
volume = 11,
place = {United States},
year = {2016},
month = {12}
}