Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Ultrahigh-resolution mass spectrometry data associated with the manuscript “A functional microbiome catalog crowdsourced from North American rivers"

Dataset ·
DOI:https://doi.org/10.15485/2439202· OSTI ID:2439202

This data package is associated with the publication “A functional microbiome catalog crowdsourced from North American rivers” submitted to Nature (Borton et al., 2024); (https://www.biorxiv.org/content/10.1101/2023.07.22.550117v1). Predicting elemental cycles and maintaining water quality under increasing anthropogenic influence requires understanding the spatial drivers of river microbiomes. However, the unifying microbial determinants governing river biogeochemistry are hindered by a lack of genome-resolved functional insights and sampling across multiple rivers. Here we employed a community science effort to accelerate the sampling of river microbiomes to create the Genome Resolved Open Watersheds database (GROWdb). GROWdb is a publicly available resource that paves the way for watershed predictive modeling and microbiome-based management practices. This resource profiled the identity, distribution, function, and expression of thousands of microbial genomes across rivers covering 90% of United States watersheds. We identified the most cosmopolitan microbiome members, while also revealing local drivers of strain endemism across ecological dimensions. We provide the first evidence that microbial functional trait expression followed the tenets of the River Continuum Concept, suggesting the structure and function of river microbiomes is predictable. The Fourier-transform ion cyclotron resonance mass spectrometry (FTICR-MS) data were one of many different data types used in establishing the ecological dimensions along which different microbes were detected .This data package only contains the processed FTICR-MS data associated with this manuscript; all other data is accessible via Zenodo (https://zenodo.org/records/8173287), GitHub (https://github.com/jmikayla1991/Genome-Resolved-Open-Watersheds-database-GROWdb), KBase (https://doi.org/10.25982/109073.30/1895615), and NCBI via Bioproject PRJNA946291.This dataset consists of (1) a file-level metadata (flmd) file; (2) a data dictionary (dd) file; (3) a readme; (4) three Fourier-transform ion cyclotron resonance mass spectrometry (FTICR-MS) processed data files (a ‘data’ file containing peak-by-sample observations, a ‘mol’ file containing peak metadata, and a transformation profile containing transformation-by-sample observations). All files are .csv or .pdf.

Research Organization:
Environmental System Science Data Infrastructure for a Virtual Ecosystem; River Corridor and Watershed Biogeochemistry SFA
Sponsoring Organization:
U.S. DOE > Office of Science > Biological and Environmental Research (BER)
OSTI ID:
2439202
Country of Publication:
United States
Language:
English