skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Machine learning algorithms for modeling groundwater level changes in agricultural regions of the U.S.

Abstract

Climate, groundwater extraction, and surface water flows have complex nonlinear relationships with groundwater level in agricultural regions. To better understand the relative importance of each driver and predict groundwater level change, we develop a new ensemble modeling framework based on spectral analysis, machine learning, and uncertainty analysis, as an alternative to complex and computationally expensive physical models. We apply and evaluate this new approach in the context of two aquifer systems supporting agricultural production in the United States: the High Plains aquifer (HPA) and the Mississippi River Valley alluvial aquifer (MRVA). We select input data sets by using a combination of mutual information, genetic algorithms, and lag analysis, and then use the selected data sets in a Multilayer Perceptron network architecture to simulate seasonal groundwater level change. As expected, model results suggest that irrigation demand has the highest influence on groundwater level change for a majority of the wells. The subset of groundwater observations not used in model training or cross-validation correlates strongly (R > 0.8) with model results for 88 and 83% of the wells in the HPA and MRVA, respectively. In both aquifer systems, the error in the modeled cumulative groundwater level change during testing (2003-2012) was lessmore » than 2 m over a majority of the area. Here, we conclude that our modeling framework can serve as an alternative approach to simulating groundwater level change and water availability, especially in regions where subsurface properties are unknown.« less

Authors:
ORCiD logo [1]; ORCiD logo [1]; ORCiD logo [2]; ORCiD logo [2]
  1. Pennsylvania State Univ., University Park, PA (United States)
  2. Univ. of Chicago, Chicago, IL (United States); Argonne National Lab. (ANL), Lemont, IL (United States)
Publication Date:
Research Org.:
Argonne National Lab. (ANL), Argonne, IL (United States)
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
1357813
Alternate Identifier(s):
OSTI ID: 1357814; OSTI ID: 1398994
Grant/Contract Number:
AC02-06CH11357; 0951576
Resource Type:
Journal Article: Published Article
Journal Name:
Water Resources Research
Additional Journal Information:
Journal Volume: 53; Journal Issue: 5; Journal ID: ISSN 0043-1397
Publisher:
American Geophysical Union (AGU)
Country of Publication:
United States
Language:
English
Subject:
60 APPLIED LIFE SCIENCES; 97 MATHEMATICS AND COMPUTING; 58 GEOSCIENCES

Citation Formats

Sahoo, S., Russo, T. A., Elliott, J., and Foster, I.. Machine learning algorithms for modeling groundwater level changes in agricultural regions of the U.S.. United States: N. p., 2017. Web. doi:10.1002/2016WR019933.
Sahoo, S., Russo, T. A., Elliott, J., & Foster, I.. Machine learning algorithms for modeling groundwater level changes in agricultural regions of the U.S.. United States. doi:10.1002/2016WR019933.
Sahoo, S., Russo, T. A., Elliott, J., and Foster, I.. Sat . "Machine learning algorithms for modeling groundwater level changes in agricultural regions of the U.S.". United States. doi:10.1002/2016WR019933.
@article{osti_1357813,
title = {Machine learning algorithms for modeling groundwater level changes in agricultural regions of the U.S.},
author = {Sahoo, S. and Russo, T. A. and Elliott, J. and Foster, I.},
abstractNote = {Climate, groundwater extraction, and surface water flows have complex nonlinear relationships with groundwater level in agricultural regions. To better understand the relative importance of each driver and predict groundwater level change, we develop a new ensemble modeling framework based on spectral analysis, machine learning, and uncertainty analysis, as an alternative to complex and computationally expensive physical models. We apply and evaluate this new approach in the context of two aquifer systems supporting agricultural production in the United States: the High Plains aquifer (HPA) and the Mississippi River Valley alluvial aquifer (MRVA). We select input data sets by using a combination of mutual information, genetic algorithms, and lag analysis, and then use the selected data sets in a Multilayer Perceptron network architecture to simulate seasonal groundwater level change. As expected, model results suggest that irrigation demand has the highest influence on groundwater level change for a majority of the wells. The subset of groundwater observations not used in model training or cross-validation correlates strongly (R > 0.8) with model results for 88 and 83% of the wells in the HPA and MRVA, respectively. In both aquifer systems, the error in the modeled cumulative groundwater level change during testing (2003-2012) was less than 2 m over a majority of the area. Here, we conclude that our modeling framework can serve as an alternative approach to simulating groundwater level change and water availability, especially in regions where subsurface properties are unknown.},
doi = {10.1002/2016WR019933},
journal = {Water Resources Research},
number = 5,
volume = 53,
place = {United States},
year = {Sat May 13 00:00:00 EDT 2017},
month = {Sat May 13 00:00:00 EDT 2017}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record at 10.1002/2016WR019933

Save / Share:
  • Climate, groundwater extraction, and surface water flows have complex nonlinear relationships with groundwater level in agricultural regions. To better understand the relative importance of each driver and predict groundwater level change, we develop a new ensemble modeling framework based on spectral analysis, machine learning, and uncertainty analysis, as an alternative to complex and computationally expensive physical models. We apply and evaluate this new approach in the context of two aquifer systems supporting agricultural production in the United States: the High Plains aquifer (HPA) and the Mississippi River Valley alluvial aquifer (MRVA). We select input data sets by using a combinationmore » of mutual information, genetic algorithms, and lag analysis, and then use the selected data sets in a Multilayer Perceptron network architecture to simulate seasonal groundwater level change. As expected, model results suggest that irrigation demand has the highest influence on groundwater level change for a majority of the wells. The subset of groundwater observations not used in model training or cross-validation correlates strongly (R > 0.8) with model results for 88 and 83% of the wells in the HPA and MRVA, respectively. In both aquifer systems, the error in the modeled cumulative groundwater level change during testing (2003-2012) was less than 2 m over a majority of the area. Here, we conclude that our modeling framework can serve as an alternative approach to simulating groundwater level change and water availability, especially in regions where subsurface properties are unknown.« less
  • Reynolds Averaged Navier Stokes (RANS) models are widely used in industry to predict fluid flows, despite their acknowledged deficiencies. Not only do RANS models often produce inaccurate flow predictions, but there are very limited diagnostics available to assess RANS accuracy for a given flow configuration. If experimental or higher fidelity simulation results are not available for RANS validation, there is no reliable method to evaluate RANS accuracy. This paper explores the potential of utilizing machine learning algorithms to identify regions of high RANS uncertainty. Three different machine learning algorithms were evaluated: support vector machines, Adaboost decision trees, and random forests.more » The algorithms were trained on a database of canonical flow configurations for which validated direct numerical simulation or large eddy simulation results were available, and were used to classify RANS results on a point-by-point basis as having either high or low uncertainty, based on the breakdown of specific RANS modeling assumptions. Classifiers were developed for three different basic RANS eddy viscosity model assumptions: the isotropy of the eddy viscosity, the linearity of the Boussinesq hypothesis, and the non-negativity of the eddy viscosity. It is shown that these classifiers are able to generalize to flows substantially different from those on which they were trained. As a result, feature selection techniques, model evaluation, and extrapolation detection are discussed in the context of turbulence modeling applications.« less
  • A machine learning–based framework for modeling the error introduced by surrogate models of parameterized dynamical systems is proposed. The framework entails the use of high-dimensional regression techniques (eg, random forests, and LASSO) to map a large set of inexpensively computed “error indicators” (ie, features) produced by the surrogate model at a given time instance to a prediction of the surrogate-model error in a quantity of interest (QoI). This eliminates the need for the user to hand-select a small number of informative features. The methodology requires a training set of parameter instances at which the time-dependent surrogate-model error is computed bymore » simulating both the high-fidelity and surrogate models. Using these training data, the method first determines regression-model locality (via classification or clustering) and subsequently constructs a “local” regression model to predict the time-instantaneous error within each identified region of feature space. We consider 2 uses for the resulting error model: (1) as a correction to the surrogate-model QoI prediction at each time instance and (2) as a way to statistically model arbitrary functions of the time-dependent surrogate-model error (eg, time-integrated errors). We then apply the proposed framework to model errors in reduced-order models of nonlinear oil-water subsurface flow simulations, with time-varying well-control (bottom-hole pressure) parameters. The reduced-order models used in this work entail application of trajectory piecewise linearization in conjunction with proper orthogonal decomposition. Moreover, when the first use of the method is considered, numerical experiments demonstrate consistent improvement in accuracy in the time-instantaneous QoI prediction relative to the original surrogate model, across a large number of test cases. When the second use is considered, results show that the proposed method provides accurate statistical predictions of the time- and well-averaged errors.« less
  • Purpose: To develop and test population-based machine learning algorithms for delineating high-dose clinical target volumes (CTVs) in H&N tumors. Automating and standardizing the contouring of CTVs can reduce both physician contouring time and inter-physician variability, which is one of the largest sources of uncertainty in H&N radiotherapy. Methods: Twenty-five node-negative patients treated with definitive radiotherapy were selected (6 right base of tongue, 11 left and 9 right tonsil). All patients had GTV and CTVs manually contoured by an experienced radiation oncologist prior to treatment. This contouring process, which is driven by anatomical, pathological, and patient specific information, typically results inmore » non-uniform margin expansions about the GTV. Therefore, we tested two methods to delineate high-dose CTV given a manually-contoured GTV: (1) regression-support vector machines(SVM) and (2) classification-SVM. These models were trained and tested on each patient group using leave-one-out cross-validation. The volume difference(VD) and Dice similarity coefficient(DSC) between the manual and auto-contoured CTV were calculated to evaluate the results. Distances from GTV-to-CTV were computed about each patient’s GTV and these distances, in addition to distances from GTV to surrounding anatomy in the expansion direction, were utilized in the regression-SVM method. The classification-SVM method used categorical voxel-information (GTV, selected anatomical structures, else) from a 3×3×3cm3 ROI centered about the voxel to classify voxels as CTV. Results: Volumes for the auto-contoured CTVs ranged from 17.1 to 149.1cc and 17.4 to 151.9cc; the average(range) VD between manual and auto-contoured CTV were 0.93 (0.48–1.59) and 1.16(0.48–1.97); while average(range) DSC values were 0.75(0.59–0.88) and 0.74(0.59–0.81) for the regression-SVM and classification-SVM methods, respectively. Conclusion: We developed two novel machine learning methods to delineate high-dose CTV for H&N patients. Both methods showed promising results that hint to a solution to the standardization of the contouring process of clinical target volumes. Varian Medical Systems grant.« less