DOE Data Explorer title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Artificial intelligence models, photos, and data associated with the manuscript “Quantifying Streambed Grain Size, Uncertainty, and Hydrobiogeochemical Parameters Using Machine Learning Model YOLO” (v2)

Abstract

This data package is associated with the manuscript “Quantifying Streambed Grain Size, Uncertainty, and Hydrobiogeochemical Parameters Using Machine Learning Model YOLO” published in Water Resources Research (Chen et al., 2024). This data package includes the training, validation, testing, and prediction data used by the artificial intelligence (AI) model for automated grain size and hydro-biogeochemistry quantification using streambed photos. The grain size data are extracted for each photo using You Look Only Once (YOLO), a pre-trained object detection model. This data package was originally published in October 2023. It was updated August 2025 (v2; new and modified files). File and folder names were not revised to indicate changes. See the change history section in the readme for more details.Please see flmd.csv for a list of all files contained in this data package and descriptions for each. Please see dd.csv for a data dictionary that defines the column headers of .csv files in the data package.This dataset is comprised of one data folder containing (1) file-level metadata; (2) data dictionary; (3) readme; and (4) six subfolders. Subfolders 1 to 4 include the training, validation, testing, and prediction data. Subfolder 5_Summary includes the summary results of different combinations of training, validation, testing, andmore » prediction data. Subfolder 6_SupplementalData includes additional data downloaded from public sources (Kaufman et al., 2023a; Kaufman et al., 2023b; Garefalakis et al., 2023; Mair et al., 2024; https://github.com/river-corridors-sfa/Geospatial_variables). In total, the data package includes 110 folders and 44,283 files. These files include 9,047 .jpg photos, 1 .png photo, 3 .tif photos; 26,639 photo labels and individual grain sizes and probability from AI (.txt); 8,447 grain size distribution data (.dat); and 126 CSV files for results summary, and 14 required metadata files (.xlsx). The summary CSV files contain 68 columns and approximately 2,200 rows that represent photo names, site locations, recording time, GPS coordinates, grains sizes (D10, D50, D60, and D84), number of grains, and additional hydro-biogeochemical data such as water depth, flow velocity, Manning’s coefficient, friction factor, hydraulic conductivity, permeability, streambed interstitial velocity magnitude, mass transfer rate, and nitrate uptake velocity. The photos were obtained from 75 sites in the Yakima River Basin and the Columbia River shorelines, and other associated data from samples and sensors obtained when the photos were taken are publicly available (Fulton et al. 2022; Grieger et al. 2023). All files are .csv, .txt, .dat, .jpg, or .pdf.« less

Authors:
ORCiD logo ; ORCiD logo ; ORCiD logo ; ORCiD logo ; ORCiD logo ; ORCiD logo ; ORCiD logo ; ORCiD logo ; ORCiD logo ; ORCiD logo ; ; ORCiD logo ; ORCiD logo ; ; ORCiD logo ; ORCiD logo ; ORCiD logo
  1. Pacific Northwest National Laboratory
  2. Yangtze University
  3. University of California San Diego
  4. University of Wisconsin - Madison
Publication Date:
DOE Contract Number:  
AC02-05CH11231
Research Org.:
River Corridor and Watershed Biogeochemistry SFA
Sponsoring Org.:
U.S. DOE > Office of Science > Biological and Environmental Research (BER)
Subject:
54 ENVIRONMENTAL SCIENCES; AI; Area; Area ratio; Biogeochemistry; Catchment; D10 area; D10 count; D5 area; D5 count; D50 area; D50 count; D84 area; D84 count; Date; Depth; Drainage area; EARTH SCIENCE > TERRESTRIAL HYDROSPHERE > SURFACE WATER; EARTH SCIENCE > TERRESTRIAL HYDROSPHERE > SURFACE WATER > SURFACE WATER PROCESSES/MEASUREMENTS > WATER DEPTH; EARTH SCIENCE > TERRESTRIAL HYDROSPHERE > SURFACE WATER > WATERSHED CHARACTERISTICS > WATERSHED DRAINAGE; EARTH SCIENCE > TERRESTRIAL HYDROSPHERE > SURFACE WATER > WATERSHED CHARACTERISTICS > WATERSHED SLOPE; ESS-DIVE CSV File Formatting Guidelines Reporting Format; ESS-DIVE File Level Metadata Reporting Format; ESS-DIVE Model Data Archiving Guidelines; Freshwater; Friction factor; Grain number; Grain size; Height; Hydraulic exchange; Hydrobiogeochemical function; Hydrology; Hyporheic zone; Interstitial velocity magnitude; ML; Machine learning; Mass transfer; ModEx; Nitrate uptake velocity; Permeability; Photo resolution; Pixel; Relative error; Resolution; River; River corridor; Scale; Shear velocity; Slope; Stream; Stream order; Streambed; Streambed hydro-biogeochemistry; Velocity; Watershed; YOLO model; You Only Look Once; altitude; latitude; longitude
OSTI Identifier:
1999774
DOI:
https://doi.org/10.15485/1999774

Citation Formats

Chen, Yunxiang, Bao, Jie, Chen, Yao, Li, Bing, Yang, Yuan, Stegen, James C., Renteria, Lupita, Delgado, Dillman, Forbes, Brieanne, Goldman, Amy E., Simhan, Manasi, Barnes, Morgan, Laan, Maggi, McKever, Sophia A., Hou, Zhangshuan, Chen, Xingyuan, and Scheibe, Timothy D. Artificial intelligence models, photos, and data associated with the manuscript “Quantifying Streambed Grain Size, Uncertainty, and Hydrobiogeochemical Parameters Using Machine Learning Model YOLO” (v2). United States: N. p., 2023. Web. doi:10.15485/1999774.
Chen, Yunxiang, Bao, Jie, Chen, Yao, Li, Bing, Yang, Yuan, Stegen, James C., Renteria, Lupita, Delgado, Dillman, Forbes, Brieanne, Goldman, Amy E., Simhan, Manasi, Barnes, Morgan, Laan, Maggi, McKever, Sophia A., Hou, Zhangshuan, Chen, Xingyuan, & Scheibe, Timothy D. Artificial intelligence models, photos, and data associated with the manuscript “Quantifying Streambed Grain Size, Uncertainty, and Hydrobiogeochemical Parameters Using Machine Learning Model YOLO” (v2). United States. doi:https://doi.org/10.15485/1999774
Chen, Yunxiang, Bao, Jie, Chen, Yao, Li, Bing, Yang, Yuan, Stegen, James C., Renteria, Lupita, Delgado, Dillman, Forbes, Brieanne, Goldman, Amy E., Simhan, Manasi, Barnes, Morgan, Laan, Maggi, McKever, Sophia A., Hou, Zhangshuan, Chen, Xingyuan, and Scheibe, Timothy D. 2023. "Artificial intelligence models, photos, and data associated with the manuscript “Quantifying Streambed Grain Size, Uncertainty, and Hydrobiogeochemical Parameters Using Machine Learning Model YOLO” (v2)". United States. doi:https://doi.org/10.15485/1999774. https://www.osti.gov/servlets/purl/1999774. Pub date:Tue Sep 12 00:00:00 EDT 2023
@article{osti_1999774,
title = {Artificial intelligence models, photos, and data associated with the manuscript “Quantifying Streambed Grain Size, Uncertainty, and Hydrobiogeochemical Parameters Using Machine Learning Model YOLO” (v2)},
author = {Chen, Yunxiang and Bao, Jie and Chen, Yao and Li, Bing and Yang, Yuan and Stegen, James C. and Renteria, Lupita and Delgado, Dillman and Forbes, Brieanne and Goldman, Amy E. and Simhan, Manasi and Barnes, Morgan and Laan, Maggi and McKever, Sophia A. and Hou, Zhangshuan and Chen, Xingyuan and Scheibe, Timothy D.},
abstractNote = {This data package is associated with the manuscript “Quantifying Streambed Grain Size, Uncertainty, and Hydrobiogeochemical Parameters Using Machine Learning Model YOLO” published in Water Resources Research (Chen et al., 2024). This data package includes the training, validation, testing, and prediction data used by the artificial intelligence (AI) model for automated grain size and hydro-biogeochemistry quantification using streambed photos. The grain size data are extracted for each photo using You Look Only Once (YOLO), a pre-trained object detection model. This data package was originally published in October 2023. It was updated August 2025 (v2; new and modified files). File and folder names were not revised to indicate changes. See the change history section in the readme for more details.Please see flmd.csv for a list of all files contained in this data package and descriptions for each. Please see dd.csv for a data dictionary that defines the column headers of .csv files in the data package.This dataset is comprised of one data folder containing (1) file-level metadata; (2) data dictionary; (3) readme; and (4) six subfolders. Subfolders 1 to 4 include the training, validation, testing, and prediction data. Subfolder 5_Summary includes the summary results of different combinations of training, validation, testing, and prediction data. Subfolder 6_SupplementalData includes additional data downloaded from public sources (Kaufman et al., 2023a; Kaufman et al., 2023b; Garefalakis et al., 2023; Mair et al., 2024; https://github.com/river-corridors-sfa/Geospatial_variables). In total, the data package includes 110 folders and 44,283 files. These files include 9,047 .jpg photos, 1 .png photo, 3 .tif photos; 26,639 photo labels and individual grain sizes and probability from AI (.txt); 8,447 grain size distribution data (.dat); and 126 CSV files for results summary, and 14 required metadata files (.xlsx). The summary CSV files contain 68 columns and approximately 2,200 rows that represent photo names, site locations, recording time, GPS coordinates, grains sizes (D10, D50, D60, and D84), number of grains, and additional hydro-biogeochemical data such as water depth, flow velocity, Manning’s coefficient, friction factor, hydraulic conductivity, permeability, streambed interstitial velocity magnitude, mass transfer rate, and nitrate uptake velocity. The photos were obtained from 75 sites in the Yakima River Basin and the Columbia River shorelines, and other associated data from samples and sensors obtained when the photos were taken are publicly available (Fulton et al. 2022; Grieger et al. 2023). All files are .csv, .txt, .dat, .jpg, or .pdf.},
doi = {10.15485/1999774},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Tue Sep 12 00:00:00 EDT 2023},
month = {Tue Sep 12 00:00:00 EDT 2023}
}