skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Machine learning algorithms for modeling groundwater level changes in agricultural regions of the U.S.

Abstract

Climate, groundwater extraction, and surface water flows have complex nonlinear relationships with groundwater level in agricultural regions. To better understand the relative importance of each driver and predict groundwater level change, we develop a new ensemble modeling framework based on spectral analysis, machine learning, and uncertainty analysis, as an alternative to complex and computationally expensive physical models. We apply and evaluate this new approach in the context of two aquifer systems supporting agricultural production in the United States: the High Plains aquifer (HPA) and the Mississippi River Valley alluvial aquifer (MRVA). We select input data sets by using a combination of mutual information, genetic algorithms, and lag analysis, and then use the selected data sets in a Multilayer Perceptron network architecture to simulate seasonal groundwater level change. As expected, model results suggest that irrigation demand has the highest influence on groundwater level change for a majority of the wells. The subset of groundwater observations not used in model training or cross-validation correlates strongly (R > 0.8) with model results for 88 and 83% of the wells in the HPA and MRVA, respectively. In both aquifer systems, the error in the modeled cumulative groundwater level change during testing (2003-2012) was lessmore » than 2 m over a majority of the area. Here, we conclude that our modeling framework can serve as an alternative approach to simulating groundwater level change and water availability, especially in regions where subsurface properties are unknown.« less

Authors:
ORCiD logo [1]; ORCiD logo [1]; ORCiD logo [2]; ORCiD logo [2]
  1. Pennsylvania State Univ., University Park, PA (United States)
  2. Univ. of Chicago, Chicago, IL (United States); Argonne National Lab. (ANL), Lemont, IL (United States)
Publication Date:
Research Org.:
Argonne National Lab. (ANL), Argonne, IL (United States)
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
1357813
Grant/Contract Number:
AC02-06CH11357; 0951576
Resource Type:
Journal Article: Published Article
Journal Name:
Water Resources Research
Additional Journal Information:
Journal Volume: 53; Journal Issue: 5; Journal ID: ISSN 0043-1397
Publisher:
American Geophysical Union (AGU)
Country of Publication:
United States
Language:
English
Subject:
60 APPLIED LIFE SCIENCES; 97 MATHEMATICS AND COMPUTING; 58 GEOSCIENCES

Citation Formats

Sahoo, S., Russo, T. A., Elliott, J., and Foster, I. Machine learning algorithms for modeling groundwater level changes in agricultural regions of the U.S.. United States: N. p., 2017. Web. doi:10.1002/2016WR019933.
Sahoo, S., Russo, T. A., Elliott, J., & Foster, I. Machine learning algorithms for modeling groundwater level changes in agricultural regions of the U.S.. United States. doi:10.1002/2016WR019933.
Sahoo, S., Russo, T. A., Elliott, J., and Foster, I. 2017. "Machine learning algorithms for modeling groundwater level changes in agricultural regions of the U.S.". United States. doi:10.1002/2016WR019933.
@article{osti_1357813,
title = {Machine learning algorithms for modeling groundwater level changes in agricultural regions of the U.S.},
author = {Sahoo, S. and Russo, T. A. and Elliott, J. and Foster, I.},
abstractNote = {Climate, groundwater extraction, and surface water flows have complex nonlinear relationships with groundwater level in agricultural regions. To better understand the relative importance of each driver and predict groundwater level change, we develop a new ensemble modeling framework based on spectral analysis, machine learning, and uncertainty analysis, as an alternative to complex and computationally expensive physical models. We apply and evaluate this new approach in the context of two aquifer systems supporting agricultural production in the United States: the High Plains aquifer (HPA) and the Mississippi River Valley alluvial aquifer (MRVA). We select input data sets by using a combination of mutual information, genetic algorithms, and lag analysis, and then use the selected data sets in a Multilayer Perceptron network architecture to simulate seasonal groundwater level change. As expected, model results suggest that irrigation demand has the highest influence on groundwater level change for a majority of the wells. The subset of groundwater observations not used in model training or cross-validation correlates strongly (R > 0.8) with model results for 88 and 83% of the wells in the HPA and MRVA, respectively. In both aquifer systems, the error in the modeled cumulative groundwater level change during testing (2003-2012) was less than 2 m over a majority of the area. Here, we conclude that our modeling framework can serve as an alternative approach to simulating groundwater level change and water availability, especially in regions where subsurface properties are unknown.},
doi = {10.1002/2016WR019933},
journal = {Water Resources Research},
number = 5,
volume = 53,
place = {United States},
year = 2017,
month = 5
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record at 10.1002/2016WR019933

Save / Share:
  • Climate, groundwater extraction, and surface water flows have complex nonlinear relationships with groundwater level in agricultural regions. To better understand the relative importance of each driver and predict groundwater level change, we develop a new ensemble modeling framework based on spectral analysis, machine learning, and uncertainty analysis, as an alternative to complex and computationally expensive physical models. We apply and evaluate this new approach in the context of two aquifer systems supporting agricultural production in the United States: the High Plains aquifer (HPA) and the Mississippi River Valley alluvial aquifer (MRVA). We select input data sets by using a combinationmore » of mutual information, genetic algorithms, and lag analysis, and then use the selected data sets in a Multilayer Perceptron network architecture to simulate seasonal groundwater level change. As expected, model results suggest that irrigation demand has the highest influence on groundwater level change for a majority of the wells. The subset of groundwater observations not used in model training or cross-validation correlates strongly (R > 0.8) with model results for 88 and 83% of the wells in the HPA and MRVA, respectively. In both aquifer systems, the error in the modeled cumulative groundwater level change during testing (2003-2012) was less than 2 m over a majority of the area. Here, we conclude that our modeling framework can serve as an alternative approach to simulating groundwater level change and water availability, especially in regions where subsurface properties are unknown.« less
  • Reynolds Averaged Navier Stokes (RANS) models are widely used in industry to predict fluid flows, despite their acknowledged deficiencies. Not only do RANS models often produce inaccurate flow predictions, but there are very limited diagnostics available to assess RANS accuracy for a given flow configuration. If experimental or higher fidelity simulation results are not available for RANS validation, there is no reliable method to evaluate RANS accuracy. This paper explores the potential of utilizing machine learning algorithms to identify regions of high RANS uncertainty. Three different machine learning algorithms were evaluated: support vector machines, Adaboost decision trees, and random forests.more » The algorithms were trained on a database of canonical flow configurations for which validated direct numerical simulation or large eddy simulation results were available, and were used to classify RANS results on a point-by-point basis as having either high or low uncertainty, based on the breakdown of specific RANS modeling assumptions. Classifiers were developed for three different basic RANS eddy viscosity model assumptions: the isotropy of the eddy viscosity, the linearity of the Boussinesq hypothesis, and the non-negativity of the eddy viscosity. It is shown that these classifiers are able to generalize to flows substantially different from those on which they were trained. As a result, feature selection techniques, model evaluation, and extrapolation detection are discussed in the context of turbulence modeling applications.« less
  • A machine learning–based framework for modeling the error introduced by surrogate models of parameterized dynamical systems is proposed. The framework entails the use of high-dimensional regression techniques (eg, random forests, and LASSO) to map a large set of inexpensively computed “error indicators” (ie, features) produced by the surrogate model at a given time instance to a prediction of the surrogate-model error in a quantity of interest (QoI). This eliminates the need for the user to hand-select a small number of informative features. The methodology requires a training set of parameter instances at which the time-dependent surrogate-model error is computed bymore » simulating both the high-fidelity and surrogate models. Using these training data, the method first determines regression-model locality (via classification or clustering) and subsequently constructs a “local” regression model to predict the time-instantaneous error within each identified region of feature space. We consider 2 uses for the resulting error model: (1) as a correction to the surrogate-model QoI prediction at each time instance and (2) as a way to statistically model arbitrary functions of the time-dependent surrogate-model error (eg, time-integrated errors). We then apply the proposed framework to model errors in reduced-order models of nonlinear oil-water subsurface flow simulations, with time-varying well-control (bottom-hole pressure) parameters. The reduced-order models used in this work entail application of trajectory piecewise linearization in conjunction with proper orthogonal decomposition. Moreover, when the first use of the method is considered, numerical experiments demonstrate consistent improvement in accuracy in the time-instantaneous QoI prediction relative to the original surrogate model, across a large number of test cases. When the second use is considered, results show that the proposed method provides accurate statistical predictions of the time- and well-averaged errors.« less
  • In this paper, we investigate the OpenMP parallelization and optimization of two novel data classification algorithms. The new algorithms are based on graph and PDE solution techniques and provide significant accuracy and performance advantages over traditional data classification algorithms in serial mode. The methods leverage the Nystrom extension to calculate eigenvalue/eigenvectors of the graph Laplacian and this is a self-contained module that can be used in conjunction with other graph-Laplacian based methods such as spectral clustering. We use performance tools to collect the hotspots and memory access of the serial codes and use OpenMP as the parallelization language to parallelizemore » the most time-consuming parts. Where possible, we also use library routines. We then optimize the OpenMP implementations and detail the performance on traditional supercomputer nodes (in our case a Cray XC30), and test the optimization steps on emerging testbed systems based on Intel’s Knights Corner and Landing processors. We show both performance improvement and strong scaling behavior. Finally, a large number of optimization techniques and analyses are necessary before the algorithm reaches almost ideal scaling.« less