skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Contaminant source identification using semi-supervised machine learning

Journal Article · · Journal of Contaminant Hydrology

Identification of the original groundwater types present in geochemical mixtures observed in an aquifer is a challenging but very important task. Frequently, some of the groundwater types are related to different infiltration and/or contamination sources associated with various geochemical signatures and origins. The characterization of groundwater mixing processes typically requires solving complex inverse models representing groundwater flow and geochemical transport in the aquifer, where the inverse analysis accounts for available site data. Usually, the model is calibrated against the available data characterizing the spatial and temporal distribution of the observed geochemical types. Numerous different geochemical constituents and processes may need to be simulated in these models which further complicates the analyses. In this paper, we propose a new contaminant source identification approach that performs decomposition of the observation mixtures based on Non-negative Matrix Factorization (NMF) method for Blind Source Separation (BSS), coupled with a custom semi-supervised clustering algorithm. Our methodology, called NMFk, is capable of identifying (a) the unknown number of groundwater types and (b) the original geochemical concentration of the contaminant sources from measured geochemical mixtures with unknown mixing ratios without any additional site information. NMFk is tested on synthetic and real-world site data. Finally, the NMFk algorithm works with geochemical data represented in the form of concentrations, ratios (of two constituents; for example, isotope ratios), and delta notations (standard normalized stable isotope ratios).

Research Organization:
Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
Sponsoring Organization:
USDOE Office of Environmental Management (EM)
Grant/Contract Number:
AC52-06NA25396; 11145687
OSTI ID:
1408837
Alternate ID(s):
OSTI ID: 1526792
Report Number(s):
LA-UR-17-23269; TRN: US1703077
Journal Information:
Journal of Contaminant Hydrology, Vol. 212; ISSN 0169-7722
Publisher:
ElsevierCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 33 works
Citation information provided by
Web of Science

References (27)

Blind source separation for groundwater pressure analysis based on nonnegative matrix factorization journal September 2014
Deciphering Signatures of Mutational Processes Operative in Human Cancer journal January 2013
Pollution source identification in heterogeneous porous media journal August 2001
A blind source separation technique using second-order statistics journal January 1997
Combined Use of Groundwater Dating, Chemical, and Isotopic Analyses to Resolve the History and Fate of Nitrate Contamination in Two Agricultural Watersheds, Atlantic Coastal Plain, Maryland journal September 1995
Identification of a time-dependent source term in nonlinear hyperbolic or parabolic heat equation journal December 2015
Algorithm quasi‐optimal (AQ) learning
  • Cervone, Guido; Franzese, Pasquale; Keesee, Allen P. K.
  • Wiley Interdisciplinary Reviews: Computational Statistics, Vol. 2, Issue 2 https://doi.org/10.1002/wics.78
journal March 2010
Artificial intelligence for management and control of pollution minimization and mitigation processes journal March 2003
Multivariate statistical and GIS-based approach to identify heavy metal sources in soils journal October 2001
The Representation and Matching of Pictorial Structures journal January 1973
Identification of Contaminant Sources in Water Distribution Systems Using Simulation–Optimization Method: Case Study journal July 2006
Inverse source problem in a one-dimensional evolution linear transport equation with spatially varying coefficients: application to surface water pollution journal September 2013
Temporal evolution of groundwater composition in an alluvial aquifer (Pisuerga River, Spain) by principal component analysis journal February 2000
Applicability of statistical learning algorithms in groundwater quality modeling: GROUNDWATER MODELING BY LEARNING MACHINES journal May 2005
Emerging organic contaminants in groundwater: A review of sources, fate and occurrence journal April 2012
Learning the parts of objects by non-negative matrix factorization journal October 1999
Point source identification in nonlinear advection–diffusion–reaction systems journal March 2013
Estimation of historical groundwater contaminant distribution using the adjoint state method applied to geostatistical inverse modeling: ESTIMATION OF GROUNDWATER CONTAMINANT DISTRIBUTION journal August 2004
Comparison of inverse methods for reconstructing the release history of a groundwater contamination source journal September 2000
Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values journal June 1994
Silhouettes: A graphical aid to the interpretation and validation of cluster analysis journal November 1987
Assessment of surface water quality using multivariate statistical techniques: A case study of the Fuji river basin, Japan journal April 2007
Statistical source identification of metals in groundwater exposed to industrial contamination journal May 2007
A Critical Review of the Risks to Water Resources from Unconventional Shale Gas Development and Hydraulic Fracturing in the United States journal March 2014
Line Search Filter Methods for Nonlinear Programming: Motivation and Global Convergence journal January 2005
On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming journal April 2005
Simultaneous parameter estimation and contaminant source characterization for coupled groundwater flow and contaminant transport modelling journal July 1992

Cited By (5)

Unsupervised phase mapping of X-ray diffraction data by nonnegative matrix factorization integrated with custom clustering journal August 2018
A Comparison of Machine-Learning Methods to Select Socioeconomic Indicators in Cultural Landscapes journal November 2018
Distributed non-negative matrix factorization with determination of the number of latent features journal February 2020
Unsupervised machine learning based on non-negative tensor factorization for analyzing reactive-mixing journal October 2019
Targeted Source Detection for Environmental Data preprint January 2019