skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: The value of human data annotation for machine learning based anomaly detection in environmental systems

Journal Article · · Water Research
ORCiD logo [1]; ORCiD logo [2]; ORCiD logo [1]; ORCiD logo [3]; ORCiD logo [3];  [3];  [1]; ORCiD logo [4];  [3]; ORCiD logo [5]; ORCiD logo [1];  [6];  [7]
  1. Swiss Federal Institute of Aquatic Science and Technology (Eawag), Dübendorf (Switzerland); Eidgenoessische Technische Hochschule (ETH), Zurich (Switzerland)
  2. onCyt Microbiology AG, Zürich (Switzerland)
  3. Swiss Federal Institute of Aquatic Science and Technology (Eawag), Dübendorf (Switzerland)
  4. Swiss Federal Institute of Aquatic Science and Technology (Eawag), Dübendorf (Switzerland); Lund Univ. (Sweden)
  5. Ecole Polytechnique Federale Lausanne (Switzerland)
  6. Université Savoie Mont Blanc, Thonon-les-Bains (France)
  7. Swiss Federal Institute of Aquatic Science and Technology (Eawag), Dübendorf (Switzerland); Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)

Anomaly detection is the process of identifying unexpected data samples in datasets. Automated anomaly detection is either performed using supervised machine learning models, which require a labelled dataset for their calibration, or unsupervised models, which do not require labels. While academic research has produced a vast array of tools and machine learning models for automated anomaly detection, the research community focused on environmental systems still lacks a comparative analysis that is simultaneously comprehensive, objective, and systematic. This knowledge gap is addressed for the first time in this study, where 15 different supervised and unsupervised anomaly detection models are evaluated on 5 different environmental datasets from engineered and natural aquatic systems. To this end, anomaly detection performance, labelling efforts, as well as the impact of model and algorithm tuning are taken into account. As a result, our analysis reveals the relative strengths and weaknesses of the different approaches in an objective manner without bias for any particular paradigm in machine learning. Most importantly, our results show that expert-based data annotation is extremely valuable for anomaly detection based on machine learning.

Research Organization:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE; Eawag Discretionary Funds
Grant/Contract Number:
AC05-00OR22725; 5221.00492.012.02
OSTI ID:
1827039
Journal Information:
Water Research, Vol. 206; ISSN 0043-1354
Publisher:
ElsevierCopyright Statement
Country of Publication:
United States
Language:
English

References (32)

Functional unfold principal component analysis for automatic plant-based stress detection in grapevine journal January 2012
One-class classification: taxonomy of study and review of techniques journal January 2014
Fault detection in a real wastewater plant using parameter-estimation techniques journal August 1996
Real-time remote monitoring of small-scaled biological wastewater treatment plants by a multivariate statistical process control and neural network-based software sensors journal October 2008
Clustering and Support Vector Regression for Water Demand Forecasting and Anomaly Detection journal March 2017
Environmental Data Science journal August 2018
Transforming data into knowledge for improved wastewater treatment operation: A critical review of techniques journal August 2018
Performance evaluation of fault detection methods for wastewater treatment processes journal November 2010
Multivariate and multiscale monitoring of wastewater treatment operation journal October 2001
The Fourth-Revolution in the Water Sector Encounters the Digital Revolution journal March 2020
Advanced monitoring of water systems using in situ measurement stations: data validation and fault detection journal September 2013
Self-Organizing Maps Application in a Remote Water Quality Monitoring System journal February 2005
A framework for automated anomaly detection in high frequency water-quality data from in situ sensors journal May 2019
An introduction to ROC analysis journal June 2006
A domain-independent methodology to analyze IoT data streams in real-time. A proof of concept implementation for anomaly detection from environmental data journal July 2016
Interactive effects of foundation species on ecosystem functioning and stability in response to disturbance journal October 2019
Fault detection in a real-time monitoring network for water quality in the lagoon of Venice (Italy) journal December 2004
Anomaly detection in streaming environmental sensor data: A data-driven modeling approach journal September 2010
Monitoring of a sequencing batch reactor using adaptive multiblock principal component analysis journal March 2003
Active learning for anomaly detection in environmental data journal December 2020
Combining multiway principal component analysis (MPCA) and clustering for efficient data mining of historical data sets of SBR processes journal May 2008
River Flooding Forecasting and Anomaly Detection Based on Deep Learning journal January 2020
Online flow cytometry reveals microbial dynamics influenced by concurrent natural and operational events in groundwater used for drinking water treatment journal December 2016
Anomaly Detection in Environmental Monitoring Networks [Application Notes] journal May 2011
Multivariate SPC of a sequencing batch reactor for wastewater treatment journal January 2007
Anomaly detection: A survey journal July 2009
A comparison of multiway regression and scaling methods journal November 2001
The feasibility of automated online flow cytometry for in-situ monitoring of microbial dynamics in aquatic ecosystems journal June 2014
Abrupt Event Monitoring for Water Environment System Based on KPCA and SVM journal April 2012
A nonlinear kernel Gaussian mixture model based inferential monitoring approach for fault detection and diagnosis of chemical processes journal January 2012
Multivariate statistical monitoring of continuous wastewater treatment plants journal October 2008
Adaptive multiscale principal component analysis for on-line monitoring of a sequencing batch reactor journal March 2005

Similar Records

Process Anomaly Detection for Sparsely Labeled Events in Nuclear Power Plants
Technical Report · Wed Sep 01 00:00:00 EDT 2021 · OSTI ID:1827039

Active Learning for Anomaly Detection in Environmental data
Journal Article · Tue Sep 15 00:00:00 EDT 2020 · Environmental Modelling and Software · OSTI ID:1827039

The Livermore Brain: Massive Deep Learning Networks Enabled by High Performance Computing
Technical Report · Tue Nov 29 00:00:00 EST 2016 · OSTI ID:1827039