Generated July 21, 2023
figure1

GROWdb US River Systems

PI: Mikayla Borton

ID: growdb:01

Please cite this project and all data from these samples as:

Borton, et al. (2022) GROWdb US River Systems - Samples. [Data set]. DOE Systems Biology Knowledgebase. doi:10.25982/109073.30/1895615.

GROW Overview

We developed the Genome Resolved Open Watersheds database (GROWdb), which aims to increase genomic sampling and understanding of global river microbiomes. An emphasis of GROWdb is to create a publicly available and ever-expanding microbial genome database that is focused on rivers while being interoperable with databases from other ecosystems. GROWdb is based on a network-of-networks approach to move beyond a small collection of well-studied rivers, towards a spatially distributed, global network of systematic observations. GROWdb represents the first microbial, river-focused resource parsed at various scales from genes to MAGs to community level including expression and potential based measurements that will be of interest to microbiologists, ecologists, geochemists, hydrologists, and modelers.

Dataset Acknowledgement

GROWdb contains data from various research campaigns, please acknowledge the following data generators, as appropriate:

  • WHONDRS derived genomes or samples - include this statement in your acknowledgements: “This study used data from the Worldwide Hydrobiogeochemistry Observation Network for Dynamic River Systems (WHONDRS) under the River Corridor Science Focus Area (SFA) at the Pacific Northwest National Laboratory (PNNL) that was generated at the U.S. Department of Energy (DOE) Joint Genome Institute User Facility. PNNL is operated by Battelle Memorial Institute for the U.S. DOE under Contract No. DE-AC05-76RL01830. The SFA is supported by the U.S. DOE, Office of Biological and Environmental Research (BER), Environmental System Science (ESS) Program.”

Total Samples loaded onto this Narrative: 178

Note: Not all GROW samples may be loaded into KBase

Data Availability

The data underlying GROWdb are accessible across various platforms to ensure all levels of data structure are widely available. First, all reads and MAGs are publicly hosted on National Center for Biotechnology (NCBI) under Bioproject PRJNA946291. Second, all data related data presented here including MAG annotations, extended data tables, phylogenetic tree files, antibiotic resistance gene database files, and MAG abundance tables are available in Zenodo (link).

Beyond the flat database files listed above, our aim for GROWdb was to maximize data use by making the data available in searchable and interactive platforms including the National Microbiome Data Collaborative (NMDC) data portal, the Department of Energy’s Systems Biology Knowledgebase (KBase), and a GROW specific user interface released here, GROWdb Explorer. Each platform provides different ways to interact with GROWdb:

  • NMDC GROWdb formed a pilot project for the NMDC. Specifically, individual GROWdb datasets (metagenomes, metatranscriptomes, etc) are easily accessible and searchable through the NMDC data portal, where they are systematically connected to each other and to a rich suite of sample information and standard analysis results, following Findable, Accessible, Interoperable, and Reusable (FAIR) data practices.
  • KBase GROWdb is publicly available within KBase, including samples (this Narrative), MAGs, and corresponding genome scale metabolic models. Access within KBase allows for immediate access and reuse of data, including comparison to private data using KBase’s 500+ analysis tools.

Other linked narratives in KBase:

GROWdb Explorer GROWdb data is also explorable through a graphical user interface built through the Colorado State University Geospatial Centroid (https://geocentroid.shinyapps.io/GROWdatabase/), allowing users to search and graph microbial and spatial data simultaneously.

In summary, this microbial genome resource represents the first publicly available genome collection from rivers and offers data that can be leveraged across microbiome studies. GROWdb is an expanding repository to incorporate and unify global river multi-omic data for the future.

Metagenome & Metatranscriptome sampling locations across USA

figure1

Data that is linked to the samples in this Narrative

Narrative ContentInfoNarrative Link
GROW MAGsDereplicated MAGs 2093https://narrative.kbase.us/narrative/106867
GROW Metabolic ModelsModels derived based on MAGshttps://narrative.kbase.us/narrative/107864

Samples Upload demonstration - an example file can be access here

import some samples
This app completed without errors in 3m 14s.
Objects
Created Object Name Type Description
growdb_us_surface_SampleSet SampleSet
Summary
SampleSet object named "growdb_us_surface_SampleSet" imported.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/109073
  • Surface_US_and_batch_IEGRW_1304_1644501085.xls - Input file provided to create the sample set.
v3 - KBaseSets.SampleSet-2.0
The viewer for the data in this Cell is available at the original Narrative here: https://narrative.kbase.us/narrative/109073

Apps

  1. Import Samples
    no citations