Dataset for "Machine Learning Ensembles Can Enhance Hydrologic Predictions and Uncertainty Quantification" Willard et al. (2025).
Abstract
This data release provides all data and code used in the paper " "Machine Learning Ensembles Can Enhance Hydrologic Predictions and Uncertainty Quantifications" Willard et al. (2025)" to model stream temperature, evaluate, and assess results. The associated manuscript explores the effect of different ensemble construction techniques across different common machine learning (ML) architectures for predictions in unmonitored basins. Modeling was done using long short-term memory (LSTM), gated recurrent unit (GRU), temporal convolution network (TCN), and extreme gradient boosting (XGBoost) models, and stream site coverage spans 1362 locations across the conterminous United States. The ensemble construction techniques investigated include ensemble by random weight initialization, differing hyperparameters, different random subsets of training data, different subselections of input features, different architectures, and Monte Carlo Dropout. The data is organized into these items items:Code repository and data for the paper " "Machine Learning Ensembles Can Enhance Hydrologic Predictions and Uncertainty Quantifications" Willard et al. (2025).Code: stream_temp_ml_regionalization.zip contains the code repositoryData to run the code:- data_dir.zip -- contains all files that should be moved to the "DATA_DIR" variable defined in the "set_env_vars.sh" script in the code repository- metadata_dir.zip -- contains all files that should be moved to the "METADATA_DIR" variable defined in the "set_env_vars.sh" scriptmore »
- Authors:
-
- Lawrence Berkeley National Laboratory
- Publication Date:
- DOE Contract Number:
- AC02-05CH11231
- Research Org.:
- iNAIADS
- Sponsoring Org.:
- U.S. DOE > Office of Science > Biological and Environmental Research (BER)
- Subject:
- 54 ENVIRONMENTAL SCIENCES; DEEP LEARNING; EARTH SCIENCE > TERRESTRIAL HYDROSPHERE > SURFACE WATER; EARTH SCIENCE > TERRESTRIAL HYDROSPHERE > WATER QUALITY/WATER CHEMISTRY; EARTH SCIENCE > TERRESTRIAL HYDROSPHERE > WATER QUALITY/WATER CHEMISTRY > WATER CHARACTERISTICS > WATER TEMPERATURE; ENSEMBLE MODELING; MACHINE LEARNING; UNCERTAINTY QUANTIFICATION
- OSTI Identifier:
- 2527393
- DOI:
- https://doi.org/10.15485/2527393
Citation Formats
Willard, Jared, and Varadharajan, Charuleka. Dataset for "Machine Learning Ensembles Can Enhance Hydrologic Predictions and Uncertainty Quantification" Willard et al. (2025).. United States: N. p., 2024.
Web. doi:10.15485/2527393.
Willard, Jared, & Varadharajan, Charuleka. Dataset for "Machine Learning Ensembles Can Enhance Hydrologic Predictions and Uncertainty Quantification" Willard et al. (2025).. United States. doi:https://doi.org/10.15485/2527393
Willard, Jared, and Varadharajan, Charuleka. 2024.
"Dataset for "Machine Learning Ensembles Can Enhance Hydrologic Predictions and Uncertainty Quantification" Willard et al. (2025).". United States. doi:https://doi.org/10.15485/2527393. https://www.osti.gov/servlets/purl/2527393. Pub date:Tue Dec 31 23:00:00 EST 2024
@article{osti_2527393,
title = {Dataset for "Machine Learning Ensembles Can Enhance Hydrologic Predictions and Uncertainty Quantification" Willard et al. (2025).},
author = {Willard, Jared and Varadharajan, Charuleka},
abstractNote = {This data release provides all data and code used in the paper " "Machine Learning Ensembles Can Enhance Hydrologic Predictions and Uncertainty Quantifications" Willard et al. (2025)" to model stream temperature, evaluate, and assess results. The associated manuscript explores the effect of different ensemble construction techniques across different common machine learning (ML) architectures for predictions in unmonitored basins. Modeling was done using long short-term memory (LSTM), gated recurrent unit (GRU), temporal convolution network (TCN), and extreme gradient boosting (XGBoost) models, and stream site coverage spans 1362 locations across the conterminous United States. The ensemble construction techniques investigated include ensemble by random weight initialization, differing hyperparameters, different random subsets of training data, different subselections of input features, different architectures, and Monte Carlo Dropout. The data is organized into these items items:Code repository and data for the paper " "Machine Learning Ensembles Can Enhance Hydrologic Predictions and Uncertainty Quantifications" Willard et al. (2025).Code: stream_temp_ml_regionalization.zip contains the code repositoryData to run the code:- data_dir.zip -- contains all files that should be moved to the "DATA_DIR" variable defined in the "set_env_vars.sh" script in the code repository- metadata_dir.zip -- contains all files that should be moved to the "METADATA_DIR" variable defined in the "set_env_vars.sh" script in the code repositoryData produced by the code and used in the paper:- outputs_dir.zip - contains model output and results (outputs_dir/results), model weights (outputs_dir/models), and all other outputs used for the paper including feature importances.To cite this code, please use the following BibTeX or MLA entries:bibtex:@misc{willard2025streamensembles,author = {Jared Willard and Charuleka Varadharajan},title = {Dataset for "Machine Learning Ensembles Can Enhance Hydrologic Predictions and Uncertainty Quantification"},year = {2024},doi = {10.15485/2527393},publisher = {ESS-DIVE Repository},url = {https://data.ess-dive.lbl.gov/datasets/doi:10.15485/2527393}}MLA: Willard, Jared, et al. Dataset for "Machine Learning Ensembles Can Enhance Hydrologic Predictions and Uncertainty Quantification". 2025. ESS-DIVE Repository, doi:10.15485/2448016.},
doi = {10.15485/2527393},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Tue Dec 31 23:00:00 EST 2024},
month = {Tue Dec 31 23:00:00 EST 2024}
}
