skip to main content


Title: Strategies to improve reference databases for soil microbiomes

A database of curated genomes is needed to better assess soil microbial communities and their processes associated with differing land management and environmental impacts. Interpreting soil metagenomic datasets with existing sequence databases is challenging because these datasets are biased towards medical and biotechnology research and can result in misleading annotations. We have curated a database of 928 genomes of soil-associated organisms (888 bacteria, 34 archaea, and 6 fungi). Using this database as a representation of the current state of knowledge of soil microbes that are well-characterized, we evaluated its composition and compared it to broader microbial databases, specifically NCBI’s RefSeq, as well as 3,035 publicly available soil amplicon datasets. These comparisons identified phyla and functions that are enriched in soils as well as those that may be underrepresented in RefSoil. For example, RefSoil was observed to have increased representation of Firmicutes despite its low abundance in soil environments and also lacked representation of Acidobacteria and Verrucomicrobia, which are abundant in soils. Our comparison of RefSoil to soil amplicon datasets allowed us to identify targets that if cultured or sequenced would significantly increase the biodiversity represented within RefSoil. To demonstrate the opportunities to access these underrepresented targets, we employed single cellmore » genomics in a pilot experiment to recover 14 genomes from the "most wanted" list, which improved RefSoil's representation of EMP sequences by 7% by abundance. This effort demonstrates the value of RefSoil in the guidance of future research efforts and the capability of single cell genomics as a practical means to fill the existing genomic data gaps.« less
ORCiD logo [1] ;  [1] ; ORCiD logo [2] ; ORCiD logo [3] ;  [4] ;  [1] ;  [1] ;  [4] ; ORCiD logo [5] ;  [1] ;  [1]
  1. Iowa State Univ., Ames, IA (United States). Dept. of Agricultural and Biosystems Engineering
  2. Bigelow Lab. for Ocean Sciences, East Boothbay, ME (United States)
  3. Univ. of British Columbia, Vancouver, BC (Canada). Dept. of Microbiology & Immunology
  4. Michigan State Univ., East Lansing, MI (United States). Center for Microbial Ecology
  5. Pacific Northwest National Lab. (PNNL), Richland, WA (United States). Environmental Molecular Sciences Lab. (EMSL); Iowa State Univ., Ames, IA (United States). Dept. of Ecology, Evolution and Organismal Biology
Publication Date:
Report Number(s):
Journal ID: ISSN 1751-7362
Grant/Contract Number:
AC05-76RL01830; SC0010775
Accepted Manuscript
Journal Name:
The ISME Journal
Additional Journal Information:
Journal Volume: 11; Journal Issue: 4; Journal ID: ISSN 1751-7362
Nature Publishing Group
Research Org:
Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Sponsoring Org:
USDOE Office of Science (SC), Biological and Environmental Research (BER) (SC-23); National Science Foundation (NSF)
Country of Publication:
United States
OSTI Identifier: