Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Can machine learning accelerate process understanding and decision‐relevant predictions of river water quality?

Journal Article · · Hydrological Processes
DOI:https://doi.org/10.1002/hyp.14565· OSTI ID:1864613
 [1];  [2];  [1];  [3];  [3];  [4];  [5];  [3];  [2];  [1];  [3];  [2];  [1];  [4];  [1];  [2]
  1. Earth and Environmental Sciences Area Lawrence Berkeley National Laboratory Berkeley California 94010 USA
  2. U.S. Geological Survey, Water Mission Area Reston Virginia USA
  3. Computing Sciences Area Lawrence Berkeley National Laboratory Berkeley California 94720 USA
  4. Department of Computer Science and Engineering University of Minnesota Minneapolis Minnesota USA
  5. Aquatic Informatics Vancouver BC V6E 4M3 Canada
Abstract

The global decline of water quality in rivers and streams has resulted in a pressing need to design new watershed management strategies. Water quality can be affected by multiple stressors including population growth, land use change, global warming, and extreme events, with repercussions on human and ecosystem health. A scientific understanding of factors affecting riverine water quality and predictions at local to regional scales, and at sub‐daily to decadal timescales are needed for optimal management of watersheds and river basins. Here, we discuss how machine learning (ML) can enable development of more accurate, computationally tractable, and scalable models for analysis and predictions of river water quality. We review relevant state‐of‐the art applications of ML for water quality models and discuss opportunities to improve the use of ML with emerging computational and mathematical methods for model selection, hyperparameter optimization, incorporating process knowledge into ML models, improving explainablity, uncertainty quantification, and model‐data integration. We then present considerations for using ML to address water quality problems given their scale and complexity, available data and computational resources, and stakeholder needs. When combined with decades of process understanding, interdisciplinary advances in knowledge‐guided ML, information theory, data integration, and analytics can help address fundamental science questions and enable decision‐relevant predictions of riverine water quality.

Research Organization:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
National Science Foundation (NSF); US Geological Survey; USDOE; USDOE Office of Science (SC), Biological and Environmental Research (BER)
Grant/Contract Number:
AC02-05CH11231
OSTI ID:
1864613
Alternate ID(s):
OSTI ID: 1865216
OSTI ID: 1869671
Journal Information:
Hydrological Processes, Journal Name: Hydrological Processes Journal Issue: 4 Vol. 36; ISSN 0885-6087
Publisher:
Wiley Blackwell (John Wiley & Sons)Copyright Statement
Country of Publication:
United Kingdom
Language:
English

References (215)

Groundwater dominated rivers journal February 1999
Machine learning and linear regression models to predict catchment-level base cation weathering rates across the southern Appalachian Mountain region, USA journal April 2014
Modeling stream temperature in the Anthropocene: An earth system modeling approach: MODELING STREAM TEMPERATURE IN ESM journal October 2015
Accelerating advances in continental domain hydrologic modeling: ACCELERATING ADVANCES IN CONTINENTAL HYDROLOGIC MODELING journal December 2015
Elemental properties, hydrology, and biology interact to shape concentration-discharge curves for carbon, nutrients, sediment, and major ions: SHAPES AND CAUSES OF C-Q RELATIONSHIPS journal February 2017
Water quality data for national‐scale aquatic research: The Water Quality Portal journal February 2017
Geomorphological factors predict water quality in boreal rivers: GEOMORPHOLOGICAL FACTORS PREDICT WATER QUALITY journal June 2014
A water cycle for the Anthropocene journal August 2019
Chronic and episodic acidification of streams along the Appalachian Trail corridor, eastern United States journal January 2020
Emerging technologies and radical collaboration to advance predictive understanding of watershed hydrobiogeochemistry journal June 2020
Predicting high‐frequency variation in stream solute concentrations with water quality sensors and machine learning journal December 2020
Deep learning approaches for improving prediction of daily stream temperature in data‐scarce, unmonitored, and dammed basins journal November 2021
Improving extreme hydrologic events forecasting using a new criterion for artificial neural network selection: SCIENTIFIC BRIEFING journal May 2001
A discrete Bayesian network to investigate suspended sediment concentrations in an Alpine proglacial zone journal August 2008
Concentration-discharge relationships reflect chemostatic characteristics of US catchments journal June 2009
A flexible nonlinear modelling framework for nonstationary generalized extreme value analysis in hydroclimatology journal November 2009
New challenges in integrated water quality modelling journal November 2010
Key factors influencing differences in stream water quality across space journal October 2017
Toward catchment hydro‐biogeochemical theories journal December 2020
Machine learning for hydrologic sciences: An introductory overview journal May 2021
Hyperparameters and tuning strategies for random forest
  • Probst, Philipp; Wright, Marvin N.; Boulesteix, Anne‐Laure
  • Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, Vol. 9, Issue 3 https://doi.org/10.1002/widm.1301
journal November 2018
Pattern Recognition and Machine Learning book January 2006
The Elements of Statistical Learning book January 2009
Automated Machine Learning: Methods, Systems, Challenges book January 2019
Meta-Learning book January 2019
Control Points in Ecosystems: Moving Beyond the Hot Spot Hot Moment Concept journal January 2017
Influence of hydrological, biogeochemical and temperature transients on subsurface carbon fluxes in a flood plain environment journal February 2016
Freshwater salinization syndrome: from emerging global problem to managing risks journal April 2021
Reactive transport codes for subsurface environmental simulation journal September 2014
Surrogate optimization of deep neural networks for groundwater predictions journal May 2020
Applicability of water quality models around the world—a review journal November 2019
River/stream water temperature forecasting using artificial intelligence models: a systematic review journal September 2020
Physics-informed neural networks for multiphysics data assimilation with application to subsurface transport journal July 2020
A workflow to address pitfalls and challenges in applying machine learning models to hydrology journal June 2021
Wavelet and statistical analysis of river water quality parameters journal June 2013
Neural networks for probabilistic environmental prediction: Conditional Density Estimation Network Creation and Evaluation (CaDENCE) in R journal April 2012
BASIN-3D: A brokering framework to integrate diverse environmental data journal February 2022
Global chemical weathering and associated P-release — The role of lithology, temperature and soil properties journal January 2014
Artificial intelligence-based single and hybrid models for prediction of water quality in rivers: A review journal May 2020
Hybrid decision tree-based machine learning models for short-term water quality prediction journal June 2020
Bridging global, basin and local-scale water quality modeling towards enhancing water quality management worldwide journal February 2019
Multivariate sensor signals collected by aquatic drones involved in water monitoring: A complete dataset journal June 2020
Drought impacts on the water quality of freshwater systems; review and integration journal January 2015
Expanding the role of reactive transport models in critical zone processes journal February 2017
Bayesian belief network models to analyse and predict ecological water quality in rivers journal September 2015
Predicting lake surface water phosphorus dynamics using process-guided machine learning journal August 2020
Developing real time operating rules for trading discharge permits in rivers: Application of Bayesian Networks journal February 2009
Anomaly detection in streaming environmental sensor data: A data-driven modeling approach journal September 2010
Evaluating, interpreting, and communicating performance of hydrologic/water quality models considering intended use: A review and recommendations journal July 2014
Identifying geochemical hot moments and their controls on a contaminated river floodplain system using wavelet and entropy approaches journal November 2016
A review of catchment-scale water quality and erosion models and a synthesis of future prospects journal April 2019
Active learning for anomaly detection in environmental data journal December 2020
Evaluation of six methods for correcting bias in estimates from ensemble tree machine learning regression models journal May 2021
Toward more mechanistic representations of biogeochemical processes in river networks: Implementation and demonstration of a multiscale model journal November 2021
A review of uncertainty quantification in deep learning: Techniques, applications and challenges journal December 2021
On the use of cross-validation for time series predictor evaluation journal May 2012
Contaminant source identification using semi-supervised machine learning journal May 2018
Uncertainty-based evaluation and comparison of SWAT and HSPF applications to the Illinois River Basin journal February 2013
Does model performance improve with complexity? A case study with three hydrological models journal April 2015
The risk of river pollution due to washout from contaminated floodplain water bodies during periods of high magnitude floods journal March 2016
Genetic programming in water resources engineering: A state-of-the-art review journal November 2018
Uncertainty in simulation of land-use change impacts on catchment runoff with multi-timescales based on the comparison of the HSPF and SWAT models journal June 2019
Multivariate event time series analysis using hydrological and suspended sediment data journal February 2021
Ensemble machine learning paradigms in hydrology: A review journal July 2021
Assessing the new Natural Resources Conservation Service water supply forecast model for the American West: A challenging test of explainable, automated, ensemble artificial intelligence journal November 2021
Physics-guided deep learning for rainfall-runoff modeling by considering extreme events and monotonic relationships journal December 2021
Combining data assimilation and machine learning to emulate a dynamical model from sparse and noisy observations: A case study with the Lorenz 96 model journal July 2020
Mitigating environmental risks: Modeling the interaction of water quality parameters and land use cover journal June 2020
Predictive uncertainty in environmental modelling journal May 2007
Measuring freshwater aquatic ecosystems: The need for a hyperspectral global mapping satellite mission journal September 2015
An integrated Markov chain Monte Carlo algorithm for upscaling hydrological and geochemical parameters from column to field scale journal April 2015
Modelling the spatial and seasonal variability of water quality for entire river networks: Relationships with natural and anthropogenic factors journal March 2016
The concentration-discharge slope as a tool for water quality management journal July 2018
Evaluating temporal controls on greenhouse gas (GHG) fluxes in an Arctic tundra environment: An entropy-based approach journal February 2019
Water-quality trends in US rivers: Exploring effects from streamflow trends and changes in watershed management journal March 2019
A framework for automated anomaly detection in high frequency water-quality data from in situ sensors journal May 2019
Estimation of nonlinear water-quality trends in high-frequency monitoring data journal May 2020
Improving prediction of water quality indices using novel hybrid machine-learning algorithms journal June 2020
Prediction of stream nitrogen and phosphorus concentrations from high-frequency sensors using Random Forests Regression journal April 2021
Predicting stream water quality under different urban development pattern scenarios with an interpretable machine learning approach journal March 2021
Challenges with secondary use of multi-source water-quality data in the United States journal March 2017
Event-scale hysteresis metrics to reveal processes and mechanisms controlling constituent export from watersheds: A review✰ journal July 2021
Causality book January 2009
Reynolds averaged turbulence modelling using deep neural networks with embedded invariance journal October 2016
From Hydrometeorology to River Water Quality: Can a Deep Learning Model Predict Dissolved Oxygen at the Continental Scale? journal February 2021
Three Principles to Use in Streamlining Water Quality Research through Data Uniformity journal November 2019
Increased river alkalinization in the Eastern U.S. journal July 2013
Estimation of nonlinear trends in water quality: An improved approach using generalized additive models: ESTIMATION OF NONLINEAR TRENDS IN WATER journal July 2008
Coupled modeling of biospheric and chemical weathering processes at the continental scale: MODELING CONTINENTAL WEATHERING journal April 2010
What Are the Key Catchment Characteristics Affecting Spatial Differences in Riverine Water Quality? journal October 2018
A New Machine‐Learning Approach for Classifying Hysteresis in Suspended‐Sediment Discharge Relationships Using High‐Frequency Monitoring Data journal June 2018
Exploring Drivers of Regional Water‐Quality Change Using Differential Spatially Referenced Regression—A Pilot Study in the Chesapeake Bay Watershed journal October 2018
A Transdisciplinary Review of Deep Learning Research and Its Relevance for Water Resources Scientists journal November 2018
Key Factors Affecting Temporal Variability in Stream Water Quality journal January 2019
Geochemical Exports to River From the Intrameander Hyporheic Zone Under Transient Hydrologic Conditions: East River Mountainous Watershed, Colorado journal October 2018
AquaSat: A Data Set to Enable Remote Sensing of Water Quality for Inland Waters journal November 2019
Process‐Guided Deep Learning Predictions of Lake Water Temperature journal November 2019
Toward Improved Predictions in Ungauged Basins: Exploiting the Power of Machine Learning journal December 2019
Hysteresis Patterns of Watershed Nitrogen Retention and Loss Over the Past 50 years in United States Hydrological Basins journal April 2021
Evaluation of Methods for Causal Discovery in Hydrometeorological Systems journal July 2020
Modeling Water Quality in Watersheds: From Here to the Next Generation journal November 2020
What Role Does Hydrological Science Play in the Age of Machine Learning? journal March 2021
Deep Learned Process Parameterizations Provide Better Representations of Turbulent Heat Fluxes in Hydrologic Models journal May 2021
Spatial and Temporal Variability in Concentration‐Discharge Relationships at the Event Scale journal October 2021
Better Subseasonal-to-Seasonal Forecasts for Water Management journal June 2021
Predicting Water Temperature Dynamics of Unmonitored Lakes With Meta‐Transfer Learning journal June 2021
A Framework for Assessing Concentration‐Discharge Catchment Behavior From Low‐Frequency Water Quality Data journal September 2021
Multi‐Task Deep Learning of Daily Streamflow and Water Temperature journal April 2022
Causes of concentration/discharge hysteresis and its potential as a tool for analysis of episode hydrochemistry journal January 1998
Regional interpretation of water-quality monitoring data journal December 1997
Dissolved organic carbon trends resulting from changes in atmospheric deposition chemistry journal November 2007
Deep learning journal May 2015
How to develop machine learning models for healthcare journal April 2019
Deep learning and process understanding for data-driven Earth system science journal February 2019
Global mapping of freshwater nutrient enrichment and periphyton growth potential journal February 2020
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead journal May 2019
The FAIR Guiding Principles for scientific data management and stewardship journal March 2016
Ethical principles in machine learning and artificial intelligence: cases from the field and possible ways forward journal June 2020
Real-Time Water Quality Monitoring: Assessment of Multisensor Data Using Bayesian Belief Networks journal January 2012
Freshwater salinization syndrome on a continental scale journal January 2018
The low but uncertain measured benefits of US water quality policy journal October 2018
Using machine learning to predict extreme events in complex systems journal December 2019
Estimating Optimal Transformations for Multiple Regression and Correlation journal September 1985
Deep learning-based water quality estimation and anomaly detection using Landsat-8/Sentinel-2 virtual constellation and cloud computing journal March 2020
Review on water quality sensors journal April 2018
How can Big Data and machine learning benefit environment and water management: a survey of methods, applications, and future directions journal July 2019
Controls of point and diffuse sources lowered riverine nutrient concentrations asynchronously, thereby warping molar N:P ratios journal September 2020
Machine learning assisted hybrid models can improve streamflow simulation in diverse catchments across the conterminous US journal September 2020
Exploring the exceptional performance of a deep learning stream temperature model and the value of streamflow data journal December 2020
Machine Learning Application in Water Quality Using Satellite Data journal August 2021
Data Science and its Relationship to Big Data and Data-Driven Decision Making journal March 2013
Physics-informed machine learning: case studies for weather and climate modelling
  • Kashinath, K.; Mustafa, M.; Albert, A.
  • Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol. 379, Issue 2194 https://doi.org/10.1098/rsta.2020.0093
journal February 2021
Measuring Information Transfer journal July 2000
Review of Deep Learning Algorithms and Architectures journal January 2019
A Survey of Online Data-Driven Proactive 5G Network Optimisation Using Machine Learning journal January 2020
Discrete-Continuous ADMM for Transductive Inference in Higher-Order MRFs conference June 2018
Explaining Deep Neural Networks and Beyond: A Review of Methods and Applications journal March 2021
HABNet: Machine Learning, Remote Sensing-Based Detection of Harmful Algal Blooms journal January 2020
A General Framework for Uncertainty Estimation in Deep Learning journal April 2020
HYPPO: A Surrogate-Based Multi-Level Parallelism Tool for Hyperparameter Optimization conference November 2021
Theory-Guided Data Science: A New Paradigm for Scientific Discovery from Data journal October 2017
Avoiding Pitfalls in Neural Network Research journal January 2007
HydroShare: Sharing Diverse Environmental Data Types and Models as Social Objects with Application to the Hydrology Domain journal October 2015
Supporting Diverse Data Providers in the Open Water Data Initiative: Communicating Water Data Quality and Fitness of Use journal March 2016
Introduction to SWAT+, A Completely Restructured Version of the Soil and Water Assessment Tool journal November 2016
Forecast First: An Argument for Groundwater Modeling in Reverse: J.T. White Groundwater xx, no. x: xx-xx journal July 2017
Groundwater Modeling with Stakeholders: Finding the Complexity that Matters: Groundwater Modeling with Stakeholders: Finding the Complexity that Matters journal August 2017
The thermal regime of rivers: a review journal August 2006
Large area Hydrologic Modeling and Assessment part i: Model Development journal February 1998
POTENTIAL EFFECTS OF CLIMATE CHANGE ON SURFACE-WATER QUALITY IN NORTH AMERICA 1 journal April 2000
Sensitivity Analysis, Calibration, and Validations for a Multisite and Multivariable swat Model journal October 2005
Development and Operational Testing of a Super-Ensemble Artificial Intelligence Flood-Forecast Model for a Pacific Northwest River journal November 2014
Generalized additive models for large data sets journal May 2014
Detecting and quantifying causal associations in large nonlinear time series datasets journal November 2019
Detecting Causality in Complex Ecosystems journal September 2012
Machine learning: Trends, perspectives, and prospects journal July 2015
Machine learning for data-driven discovery in solid Earth geoscience journal March 2019
Physics Guided RNNs for Modeling Dynamical Systems: A Case Study in Simulating Lake Temperature Profiles book May 2019
Physics-Guided Architecture (PGA) of Neural Networks for Quantifying Uncertainty in Lake Temperature Modeling book January 2020
Graph-based Reinforcement Learning for Active Learning in Real Time: An Application in Modeling River Networks book April 2021
"Why Should I Trust You?": Explaining the Predictions of Any Classifier
  • Ribeiro, Marco Tulio; Singh, Sameer; Guestrin, Carlos
  • Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD '16 https://doi.org/10.1145/2939672.2939778
conference January 2016
Subspace clustering for situation assessment in aquatic drones
  • Castellini, Alberto; Masillo, Francesco; Bicego, Manuele
  • SAC '19: The 34th ACM/SIGAPP Symposium on Applied Computing, Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing https://doi.org/10.1145/3297280.3297372
conference April 2019
AutoST: Efficient Neural Architecture Search for Spatio-Temporal Prediction
  • Li, Ting; Zhang, Junbo; Bao, Kainan
  • KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining https://doi.org/10.1145/3394486.3403122
conference August 2020
A Comprehensive Survey of Neural Architecture Search: Challenges and Solutions journal May 2022
Global Water Pollution and Human Health journal November 2010
Causal Counterfactual Theory for the Attribution of Weather and Climate-Related Events journal January 2016
An Algorithm for Fast Recovery of Sparse Causal Graphs journal April 1991
A survey of transfer learning journal May 2016
Water quality assessment and source identification of the Shuangji River (China) using multivariate statistical methods journal January 2021
Quantity is Nothing without Quality: Automated QA/QC for Streaming Environmental Sensor Data journal July 2013
A review of the potential impacts of climate change on surface water quality journal February 2009
Universal Differential Equations for Scientific Machine Learning preprint August 2020
Reactive Transport at the Crossroads journal September 2019
Untangling hybrid hydrological models with explainable artificial intelligence journal January 2021
Progress on water data integration and distribution: a summary of select US Geological Survey data systems journal July 2015
Genetic programming for hydrological applications: to model or to forecast that is the question journal April 2021
Machine learning techniques in river water quality modelling: a research travelogue journal October 2020
Investigating Causal Relations by Econometric Models and Cross-spectral Methods journal August 1969
Hyperopt: A Python Library for Optimizing the Hyperparameters of Machine Learning Algorithms conference January 2013
Making a Water Data System Responsive to Information Needs of Decision Makers journal November 2021
Estimation Bias in Water-Quality Constituent Concentrations and Fluxes: A Synthesis for Chesapeake Bay Rivers and Streams journal April 2019
Sequential Imputation of Missing Spatio-Temporal Precipitation Data Using Random Forests journal August 2020
Differential C-Q Analysis: A New Approach to Inferring Lateral Transport and Hydrologic Transients Within Multiple Reaches of a Mountainous Headwater Catchment journal August 2020
Using Convolutional Neural Networks for Streamflow Projection in California journal September 2020
Revealing Causal Controls of Storage-Streamflow Relationships With a Data-Centric Bayesian Framework Combining Machine Learning and Process-Based Modeling journal November 2020
A Review of the Artificial Neural Network Models for Water Quality Prediction journal August 2020
Application of Drone Technologies in Surface Water Resources Monitoring and Assessment: A Systematic Review of Progress, Challenges, and Opportunities in the Global South journal August 2021
Machine Learning in Agriculture: A Review journal August 2018
From Fully Physical to Virtual Sensing for Water Quality Assessment: A Comprehensive Review of the Relevant State-of-the-Art journal October 2021
Analysis and Prediction of Water Quality Using LSTM Deep Neural Networks in IoT Environment journal April 2019
Flood Prediction Using Machine Learning Models: Literature Review journal October 2018
Comparative Study of Two State-of-the-Art Semi-Distributed Hydrological Models journal April 2019
Algal Morphological Identification in Watersheds for Drinking Water Supply Using Neural Architecture Search for Convolutional Neural Network journal June 2019
Applications of Bayesian Networks as Decision Support Tools for Water Resource Management under Climate Change and Socio-Economic Stressors: A Critical Appraisal journal December 2019
Research Trends in the Use of Remote Sensing for Inland Water Quality Science: Moving Towards Multidisciplinary Applications journal January 2020
A Prediction Model Based on Deep Belief Network and Least Squares SVR Applied to Cross-Section Water Quality journal July 2020
Clustering of Time Series Water Quality Data Using Dynamic Time Warping: A Case Study from the Bukhan River Water Quality Monitoring Network journal August 2020
River Water Salinity Prediction Using Hybrid Machine Learning Models journal October 2020
Prediction of Water Level and Water Quality Using a CNN-LSTM Combined Deep Learning Approach journal December 2020
Stream Temperature Predictions for River Basin Management in the Pacific Northwest and Mid-Atlantic Regions Using Machine Learning journal March 2022
Predictive Analysis of Water Quality Parameters using Deep Learning journal September 2015
Designing a network of critical zone observatories to explore the living skin of the terrestrial Earth journal January 2017
Coupled daily streamflow and water temperature modelling in large river basins journal January 2012
The CAMELS data set: catchment attributes and meteorology for large-sample studies journal January 2017
Framework for developing hybrid process-driven, artificial neural network and regression models for salinity prediction in river systems journal January 2018
Rainfall–runoff modelling using Long Short-Term Memory (LSTM) networks journal January 2018
Towards learning universal, regional, and local hydrological behaviors via machine learning applied to large-sample datasets journal January 2019
Changing suspended sediment in United States rivers and streams: linking sediment trends to changes in land use/cover, hydrology and climate journal January 2020
Rainfall–runoff prediction at multiple timescales with a single Long Short-Term Memory network journal January 2021
Machine-learning methods for stream water temperature prediction journal January 2021
Machine learning deciphers CO2 sequestration and subsurface flowpaths from stream chemistry journal January 2021
Modeling and interpreting hydrological responses of sustainable urban drainage systems with explainable machine learning methods journal January 2021
Multi-model ensemble hydrologic prediction and uncertainties analysis journal January 2014
Linking Flow Regime and Water Quality in Rivers: a Challenge to Adaptive Catchment Management journal January 2008
Causation, Prediction, and Search (2nd edition) book January 2001