DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Contaminant source identification using semi-supervised machine learning

Abstract

Identification of the original groundwater types present in geochemical mixtures observed in an aquifer is a challenging but very important task. Frequently, some of the groundwater types are related to different infiltration and/or contamination sources associated with various geochemical signatures and origins. The characterization of groundwater mixing processes typically requires solving complex inverse models representing groundwater flow and geochemical transport in the aquifer, where the inverse analysis accounts for available site data. Usually, the model is calibrated against the available data characterizing the spatial and temporal distribution of the observed geochemical types. Numerous different geochemical constituents and processes may need to be simulated in these models which further complicates the analyses. In this paper, we propose a new contaminant source identification approach that performs decomposition of the observation mixtures based on Non-negative Matrix Factorization (NMF) method for Blind Source Separation (BSS), coupled with a custom semi-supervised clustering algorithm. Our methodology, called NMFk, is capable of identifying (a) the unknown number of groundwater types and (b) the original geochemical concentration of the contaminant sources from measured geochemical mixtures with unknown mixing ratios without any additional site information. NMFk is tested on synthetic and real-world site data. Finally, the NMFk algorithm worksmore » with geochemical data represented in the form of concentrations, ratios (of two constituents; for example, isotope ratios), and delta notations (standard normalized stable isotope ratios).« less

Authors:
ORCiD logo [1]; ORCiD logo [1]; ORCiD logo [1]
  1. Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
Publication Date:
Research Org.:
Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
Sponsoring Org.:
USDOE Office of Environmental Management (EM)
OSTI Identifier:
1408837
Alternate Identifier(s):
OSTI ID: 1526792
Report Number(s):
LA-UR-17-23269
Journal ID: ISSN 0169-7722; TRN: US1703077
Grant/Contract Number:  
AC52-06NA25396; 11145687
Resource Type:
Accepted Manuscript
Journal Name:
Journal of Contaminant Hydrology
Additional Journal Information:
Journal Volume: 212; Journal ID: ISSN 0169-7722
Publisher:
Elsevier
Country of Publication:
United States
Language:
English
Subject:
54 ENVIRONMENTAL SCIENCES; 97 MATHEMATICS AND COMPUTING; Earth Sciences; Mathematics; Non-negative matrix factorization; Feature Extraction; Blind Source Separation; Robustness analysis; Semi-supervised learning; Groundwater contamination; Source identification; Advection-diffusion transport; Geochemical signatures

Citation Formats

Vesselinov, Velimir Valentinov, Alexandrov, Boian S., and O’Malley, Dan. Contaminant source identification using semi-supervised machine learning. United States: N. p., 2017. Web. doi:10.1016/j.jconhyd.2017.11.002.
Vesselinov, Velimir Valentinov, Alexandrov, Boian S., & O’Malley, Dan. Contaminant source identification using semi-supervised machine learning. United States. https://doi.org/10.1016/j.jconhyd.2017.11.002
Vesselinov, Velimir Valentinov, Alexandrov, Boian S., and O’Malley, Dan. Wed . "Contaminant source identification using semi-supervised machine learning". United States. https://doi.org/10.1016/j.jconhyd.2017.11.002. https://www.osti.gov/servlets/purl/1408837.
@article{osti_1408837,
title = {Contaminant source identification using semi-supervised machine learning},
author = {Vesselinov, Velimir Valentinov and Alexandrov, Boian S. and O’Malley, Dan},
abstractNote = {Identification of the original groundwater types present in geochemical mixtures observed in an aquifer is a challenging but very important task. Frequently, some of the groundwater types are related to different infiltration and/or contamination sources associated with various geochemical signatures and origins. The characterization of groundwater mixing processes typically requires solving complex inverse models representing groundwater flow and geochemical transport in the aquifer, where the inverse analysis accounts for available site data. Usually, the model is calibrated against the available data characterizing the spatial and temporal distribution of the observed geochemical types. Numerous different geochemical constituents and processes may need to be simulated in these models which further complicates the analyses. In this paper, we propose a new contaminant source identification approach that performs decomposition of the observation mixtures based on Non-negative Matrix Factorization (NMF) method for Blind Source Separation (BSS), coupled with a custom semi-supervised clustering algorithm. Our methodology, called NMFk, is capable of identifying (a) the unknown number of groundwater types and (b) the original geochemical concentration of the contaminant sources from measured geochemical mixtures with unknown mixing ratios without any additional site information. NMFk is tested on synthetic and real-world site data. Finally, the NMFk algorithm works with geochemical data represented in the form of concentrations, ratios (of two constituents; for example, isotope ratios), and delta notations (standard normalized stable isotope ratios).},
doi = {10.1016/j.jconhyd.2017.11.002},
journal = {Journal of Contaminant Hydrology},
number = ,
volume = 212,
place = {United States},
year = {Wed Nov 08 00:00:00 EST 2017},
month = {Wed Nov 08 00:00:00 EST 2017}
}

Journal Article:

Citation Metrics:
Cited by: 33 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

Blind source separation for groundwater pressure analysis based on nonnegative matrix factorization
journal, September 2014

  • Alexandrov, Boian S.; Vesselinov, Velimir V.
  • Water Resources Research, Vol. 50, Issue 9
  • DOI: 10.1002/2013WR015037

Deciphering Signatures of Mutational Processes Operative in Human Cancer
journal, January 2013


Pollution source identification in heterogeneous porous media
journal, August 2001

  • Atmadja, Juliana; Bagtzoglou, Amvrossios C.
  • Water Resources Research, Vol. 37, Issue 8
  • DOI: 10.1029/2001WR000223

A blind source separation technique using second-order statistics
journal, January 1997

  • Belouchrani, A.; Abed-Meraim, K.; Cardoso, J. -F.
  • IEEE Transactions on Signal Processing, Vol. 45, Issue 2
  • DOI: 10.1109/78.554307

Identification of a time-dependent source term in nonlinear hyperbolic or parabolic heat equation
journal, December 2015


Algorithm quasi‐optimal (AQ) learning
journal, March 2010

  • Cervone, Guido; Franzese, Pasquale; Keesee, Allen P. K.
  • Wiley Interdisciplinary Reviews: Computational Statistics, Vol. 2, Issue 2
  • DOI: 10.1002/wics.78

Artificial intelligence for management and control of pollution minimization and mitigation processes
journal, March 2003


Multivariate statistical and GIS-based approach to identify heavy metal sources in soils
journal, October 2001


The Representation and Matching of Pictorial Structures
journal, January 1973

  • Fischler, M. A.; Elschlager, R. A.
  • IEEE Transactions on Computers, Vol. C-22, Issue 1
  • DOI: 10.1109/T-C.1973.223602

Identification of Contaminant Sources in Water Distribution Systems Using Simulation–Optimization Method: Case Study
journal, July 2006


Inverse source problem in a one-dimensional evolution linear transport equation with spatially varying coefficients: application to surface water pollution
journal, September 2013


Applicability of statistical learning algorithms in groundwater quality modeling: GROUNDWATER MODELING BY LEARNING MACHINES
journal, May 2005

  • Khalil, Abedalrazq; Almasri, Mohammad N.; McKee, Mac
  • Water Resources Research, Vol. 41, Issue 5
  • DOI: 10.1029/2004WR003608

Emerging organic contaminants in groundwater: A review of sources, fate and occurrence
journal, April 2012


Learning the parts of objects by non-negative matrix factorization
journal, October 1999

  • Lee, Daniel D.; Seung, H. Sebastian
  • Nature, Vol. 401, Issue 6755
  • DOI: 10.1038/44565

Point source identification in nonlinear advection–diffusion–reaction systems
journal, March 2013


Comparison of inverse methods for reconstructing the release history of a groundwater contamination source
journal, September 2000

  • Neupauer, Roseanna M.; Borchers, Brian; Wilson, John L.
  • Water Resources Research, Vol. 36, Issue 9
  • DOI: 10.1029/2000WR900176

Silhouettes: A graphical aid to the interpretation and validation of cluster analysis
journal, November 1987


Assessment of surface water quality using multivariate statistical techniques: A case study of the Fuji river basin, Japan
journal, April 2007


Statistical source identification of metals in groundwater exposed to industrial contamination
journal, May 2007

  • Tariq, Saadia R.; Shah, Munir H.; Shaheen, N.
  • Environmental Monitoring and Assessment, Vol. 138, Issue 1-3
  • DOI: 10.1007/s10661-007-9753-8

A Critical Review of the Risks to Water Resources from Unconventional Shale Gas Development and Hydraulic Fracturing in the United States
journal, March 2014

  • Vengosh, Avner; Jackson, Robert B.; Warner, Nathaniel
  • Environmental Science & Technology, Vol. 48, Issue 15
  • DOI: 10.1021/es405118y

Line Search Filter Methods for Nonlinear Programming: Motivation and Global Convergence
journal, January 2005


On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming
journal, April 2005


Works referencing / citing this record:

Unsupervised phase mapping of X-ray diffraction data by nonnegative matrix factorization integrated with custom clustering
journal, August 2018

  • Stanev, Valentin; Vesselinov, Velimir V.; Kusne, A. Gilad
  • npj Computational Materials, Vol. 4, Issue 1
  • DOI: 10.1038/s41524-018-0099-2

A Comparison of Machine-Learning Methods to Select Socioeconomic Indicators in Cultural Landscapes
journal, November 2018

  • Maldonado, Ana; Ramos-López, Darío; Aguilera , Pedro
  • Sustainability, Vol. 10, Issue 11
  • DOI: 10.3390/su10114312

Distributed non-negative matrix factorization with determination of the number of latent features
journal, February 2020


Unsupervised machine learning based on non-negative tensor factorization for analyzing reactive-mixing
journal, October 2019


Targeted Source Detection for Environmental Data
preprint, January 2019