Machine learning model inputs, outputs, and scripts associated with “Artificial intelligence-guided iterations between observations and modeling significantly improve environmental predictions”
Abstract
NOTE: The manuscript associated with this data package is currently in review. The data may be revised based on reviewer feedback. Upon manuscript acceptance, this data package will be updated with the final dataset and additional metadata. This data package is associated with the manuscript “Artificial intelligence-guided iterations between observations and modeling significantly improve environmental predictions” (Malhotra et al., in prep). This effort was designed following ICON (integrated, coordinated, open, and networked) principles to facilitate a model-experiment (ModEx) iteration approach, leveraging crowdsourced sampling across the contiguous United States (CONUS). New machine learning models were created every month to guide sampling locations. Data from the resulting samples were used to test and rebuild the machine learning models for the next round of sampling guidance. Associated sediment and water geochemistry and in situ sensor data can be found at https://data.ess-dive.lbl.gov/datasets/doi:10.15485/1923689, https://data.ess-dive.lbl.gov/datasets/doi:10.15485/1729719, and https://data.ess-dive.lbl.gov/datasets/doi:10.15485/1603775. This data package is associated with two GitHub repositories found at https://github.com/parallelworks/dynamic-learning-rivers and https://github.com/WHONDRS-Hub/ICON-ModEx_Open_Manuscript. In addition to this readme, this data package also includes two file-level metadata (FLMD) files that describes each file and two data dictionaries (DD) that describe all column/row headers and variable definitions. This data package consists of two main folders (1) dynamic-learning-rivers and (2) ICON-ModEx_Open_Manuscriptmore »
- Authors:
-
- Pacific Northwest National Laboratory
- University College Dublin
- Parallel Works Inc.
- South Dakota State University
- The Watershed Project
- University of Kansas
- Publication Date:
- DOE Contract Number:
- AC02-05CH11231
- Research Org.:
- River Corridor Hydro-biogeochemistry from Molecular to Multi-Basin Scales SFA
- Sponsoring Org.:
- ESS-DIVE; U.S. DOE > Office of Science > Biological and Environmental Research (BER)
- Subject:
- 54 ENVIRONMENTAL SCIENCES; AI; Artificial Intelligence; CONUS; Catchment; Climate; Contiguous United States; Coordinated; ESS-DIVE CSV File Formatting Guidelines Reporting Format; ESS-DIVE File Level Metadata Reporting Format; ESS-DIVE Model Data Archiving Guidelines; Hyporheic zone; ICON; Integrated; Landuse; ML; Machine Learning; ModEx; Model-experiment iteration; Networked; Open; Respiration; River; River corridor; Sediment respiration rate; Stream; WHONDRS; Watershed
- OSTI Identifier:
- 2998468
- DOI:
- https://doi.org/10.15485/2998468
Citation Formats
Forbes, Brieanne, Bruen, Michael, Fluet-Chouinard, Etienne, Goldman, Amy E., Garayburu-Caruso, Vanessa A., Gary, Stefan, Malhotra, Avni, Mehan, Sushant, Rubin, Tod, Scheibe, Timothy D., Ward, Nicholas, Rivera Waterman, Bre, and Stegen, James C. Machine learning model inputs, outputs, and scripts associated with “Artificial intelligence-guided iterations between observations and modeling significantly improve environmental predictions”. United States: N. p., 2025.
Web. doi:10.15485/2998468.
Forbes, Brieanne, Bruen, Michael, Fluet-Chouinard, Etienne, Goldman, Amy E., Garayburu-Caruso, Vanessa A., Gary, Stefan, Malhotra, Avni, Mehan, Sushant, Rubin, Tod, Scheibe, Timothy D., Ward, Nicholas, Rivera Waterman, Bre, & Stegen, James C. Machine learning model inputs, outputs, and scripts associated with “Artificial intelligence-guided iterations between observations and modeling significantly improve environmental predictions”. United States. doi:https://doi.org/10.15485/2998468
Forbes, Brieanne, Bruen, Michael, Fluet-Chouinard, Etienne, Goldman, Amy E., Garayburu-Caruso, Vanessa A., Gary, Stefan, Malhotra, Avni, Mehan, Sushant, Rubin, Tod, Scheibe, Timothy D., Ward, Nicholas, Rivera Waterman, Bre, and Stegen, James C. 2025.
"Machine learning model inputs, outputs, and scripts associated with “Artificial intelligence-guided iterations between observations and modeling significantly improve environmental predictions”". United States. doi:https://doi.org/10.15485/2998468. https://www.osti.gov/servlets/purl/2998468. Pub date:Wed Jan 01 04:00:00 UTC 2025
@article{osti_2998468,
title = {Machine learning model inputs, outputs, and scripts associated with “Artificial intelligence-guided iterations between observations and modeling significantly improve environmental predictions”},
author = {Forbes, Brieanne and Bruen, Michael and Fluet-Chouinard, Etienne and Goldman, Amy E. and Garayburu-Caruso, Vanessa A. and Gary, Stefan and Malhotra, Avni and Mehan, Sushant and Rubin, Tod and Scheibe, Timothy D. and Ward, Nicholas and Rivera Waterman, Bre and Stegen, James C.},
abstractNote = {NOTE: The manuscript associated with this data package is currently in review. The data may be revised based on reviewer feedback. Upon manuscript acceptance, this data package will be updated with the final dataset and additional metadata. This data package is associated with the manuscript “Artificial intelligence-guided iterations between observations and modeling significantly improve environmental predictions” (Malhotra et al., in prep). This effort was designed following ICON (integrated, coordinated, open, and networked) principles to facilitate a model-experiment (ModEx) iteration approach, leveraging crowdsourced sampling across the contiguous United States (CONUS). New machine learning models were created every month to guide sampling locations. Data from the resulting samples were used to test and rebuild the machine learning models for the next round of sampling guidance. Associated sediment and water geochemistry and in situ sensor data can be found at https://data.ess-dive.lbl.gov/datasets/doi:10.15485/1923689, https://data.ess-dive.lbl.gov/datasets/doi:10.15485/1729719, and https://data.ess-dive.lbl.gov/datasets/doi:10.15485/1603775. This data package is associated with two GitHub repositories found at https://github.com/parallelworks/dynamic-learning-rivers and https://github.com/WHONDRS-Hub/ICON-ModEx_Open_Manuscript. In addition to this readme, this data package also includes two file-level metadata (FLMD) files that describes each file and two data dictionaries (DD) that describe all column/row headers and variable definitions. This data package consists of two main folders (1) dynamic-learning-rivers and (2) ICON-ModEx_Open_Manuscript which contain snapshots of the associated GitHub repositories. The input data, output data, and machine learning models used to guide sampling locations are within dynamic-learning-rivers. The folder is organized into five top-level directories: (1) “input_data” holds the training data for the ML models; (2) “ml_models” holds machine learning (ML) models trained on the data in “input_data”; (3) “examples” contains files for direct experimentation with the machine learning model, including scripts for setting up “hindcast” run; (4) “scripts” contains data preprocessing and postprocessing scripts and intermediate results specific to this data set that bookend the ML workflow; and (5) “output_data” holds the overall results of the ML model on that branch. Each trained ML model resides on its own branch in the repository; this means that inputs and outputs can be different branch-to-branch. There is also one hidden directory “.github/workflows”. This hidden directory contains information for how to run the ML workflow as an end-to-end automated GitHub Action but it is not needed for reusing the ML models archived here. Please see the top-level README.md in the GitHub repository for more details on the automation. The scripts and data used to create figures in the manuscript are within ICON-ModEx_Open_Manuscript. The folder is organized into four folders which contain the scripts, data, and pdf for each figure. Within the “fig-model-score-evolution” folder, there is a folder called “intermediate_branch_data” which contains some intermediate files pulled from dynamic-learning-rivers and reorganized to easily integrate into the workflows. NOTE: THIS FOLDER INCLUDES THE FILES AT THE POINT OF PAPER SUBMISSION. IT WILL BE UPDATED ONCE THE PAPER IS ACCEPTED WITH ANY REVISIONS AND WILL INCLUDE A DD/FLMD AT THAT POINT. We thank the United States Forest Service, Washington Department of Fish and Wildlife, Washington Department of Natural Resources, Cowiche Canyon Conservatory, Washington State Parks and Recreation Commission (Scientific Research Permit #210901), and the Confederated Tribes and Bands of the Yakama Nation for access to field locations where the samples labeled “SSS” were collected. We also thank the Yakama Nation Tribal Council and Yakama Nation Fisheries for working with us to facilitate sample collection and optimization of data usage according to their values and worldview. WHONDRS consortium members were asked to provide any acknowledgments for the collection of samples labeled “CM” and the following is a list of acknowledgments that were submitted with their corresponding Site IDs: (MART) Research activities were conducted in part on the Wind River Experimental Forest within the Gifford Pinchot National Forest; (MP- 100379) Philadelphia is part of Lenapehoking, the ancestral homelands of the Lenape peoples; (MP-102398) Land surveyed is the ancestral homelands of the Nookhose'iinenno (Arapaho), Tsis tsis'tas (Cheyenne), and Nuuchu (Ute); (MP-100749 and MP- 100747) Georgia Coastal Ecosystem LTER, OCE-1832178; (SP-70 and SP-72) Eastern Shoshone, Shoshone-Bannock; (MP- 102944) Funded by Oregon Watershed Enhancement Board. On the traditional lands of the Confederated Tribes of the Siletz, Confederated Tribes of the Grand Rhonde, and the Clatsop-Nehalem Confederated Tribe; (MP- 100607) Holiday Creek is located on the traditional territory of the Monacan Indian Nation; (SP-45) Lafayette Blue Springs State Park; (MP-102420) NSF DEB-2016749; (MP-100019) New Hampshire Agriculture Experiment Station; (SP-35) Rayonier (land owner; https://www.rayonier.com/); (MP- 101276) US Department of Energy, Office of Science, Biological and Environmental Research, Subsurface Biogeochemical Research, Watershed Dynamics and Evolution SFA at ORNL; (MP- 103224) Watershed Dynamics and Evolution SFA at ORNL; (MP- 101584) Traditional lands of the Oceti Sakowin (Dakota, Lakota, Nakoda) and Anishinaabe Peoples.},
doi = {10.15485/2998468},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Wed Jan 01 04:00:00 UTC 2025},
month = {Wed Jan 01 04:00:00 UTC 2025}
}
