skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Nonnegative tensor factorization for contaminant source identification

Journal Article · · Journal of Contaminant Hydrology

Unsupervised Machine Learning (ML) is becoming increasingly popular for solving various types of data analytics problems including feature extraction, blind source separation, exploratory analyses, model diagnostics, etc. In this work, we have developed a new unsupervised ML method based on Nonnegative Tensor Factorization (NTF) for identification of the original groundwater types (including contaminant sources) present in geochemical mixtures observed in an aquifer. Frequently, groundwater types with different geochemical signatures are related to different background and/or contamination sources. The characterization of groundwater mixing processes is a challenging but very important task critical for any environmental management project aiming to characterize the fate and transport of contaminants in the subsurface and perform contaminant remediation. This task typically requires solving complex inverse models representing groundwater flow and geochemical transport in the aquifer, where the inverse analysis accounts for available site data. Usually, the model is calibrated against the available data characterizing the spatial and temporal distribution of the observed geochemical types. Numerous different geochemical constituents and processes may need to be simulated in these models which further complicates the analyses. Additionally, the application of inverse methods may introduce biases in the analyses through the assumptions made in the model development process. Here, we substitute the model inversion with unsupervised ML analysis. The ML analysis does not make any assumptions about underlying physical and geochemical processes occurring in the aquifer. Our ML methodology, called NTFk, is capable of identifying (1) the unknown number of groundwater types (contaminant sources) present in the aquifer, (2) the original geochemical concentrations (signatures) of these groundwater types and (3) spatial and temporal dynamics in the mixing of these groundwater types. These results are obtained only from the measured geochemical data without any additional site information. In general, the NTFk methodology allows for interpretation of large high-dimensional datasets representing diverse spatial and temporal components such as state variables and velocities. NTFk has been tested on synthetic and real-world site three-dimensional datasets. Finally, the NTFk algorithm is designed to work with geochemical data represented in the form of concentrations, ratios (of two constituents; for example, isotope ratios), and delta notations (standard normalized stable isotope ratios).

Research Organization:
Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA)
Grant/Contract Number:
89233218CNA000001; 20180060DR; 20190020DR; #11145687
OSTI ID:
1489953
Alternate ID(s):
OSTI ID: 1636293
Report Number(s):
LA-UR-18-22259
Journal Information:
Journal of Contaminant Hydrology, Vol. 220; ISSN 0169-7722
Publisher:
ElsevierCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 10 works
Citation information provided by
Web of Science