skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Detecting outliers in streaming time series data from ARM distributed sensors

Abstract

The Atmospheric Radiation Measurement (ARM) Data Center at ORNL collects data from a number of permanent and mobile facilities around the globe. The data is then ingested to create high level scientific products. High frequency streaming measurements from sensors and radar instruments at ARM sites require high degree of accuracy to enable rigorous study of atmospheric processes. Outliers in collected data are common due to instrument failure or extreme weather events. Thus, it is critical to identify and flag them. We employed multiple univariate, multivariate and time series techniques for outlier detection methods and studied their effectiveness. First, we examined Pearson correlation coefficient which is used to measure the pairwise correlations between variables. Singular Spectrum Analysis (SSA) was applied to detect outliers by removing the anticipated annual and seasonal cycles from the signal to accentuate anomalies. K-means was applied for multivariate examination of data from collection of sensors to identify any deviation from expected and known patterns and identify abnormal observation. The Pearson correlation coefficient, SSA and K-means methods were later combined together in a framework to detect outliersthrough a range of checks. We applied the developed method to data from meteorological sensors at ARM Southern Great Plains site andmore » validated against existing database of known data quality issues.« less

Authors:
 [1]; ORCiD logo [1]; ORCiD logo [1]; ORCiD logo [1]; ORCiD logo [1]
  1. ORNL
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Biological and Environmental Research (BER) (SC-23)
OSTI Identifier:
1491320
DOE Contract Number:  
AC05-00OR22725
Resource Type:
Conference
Resource Relation:
Conference: IEEE International Conference on Data Mining Workshops - Singapore, , Singapore - 11/17/2018 10:00:00 AM-11/20/2018 10:00:00 AM
Country of Publication:
United States
Language:
English

Citation Formats

Lu, Yuping, Kumar, Jitendra, Collier, Nathaniel O., Krishna, Bhargavi, and Langston, Michael A. Detecting outliers in streaming time series data from ARM distributed sensors. United States: N. p., 2018. Web.
Lu, Yuping, Kumar, Jitendra, Collier, Nathaniel O., Krishna, Bhargavi, & Langston, Michael A. Detecting outliers in streaming time series data from ARM distributed sensors. United States.
Lu, Yuping, Kumar, Jitendra, Collier, Nathaniel O., Krishna, Bhargavi, and Langston, Michael A. Thu . "Detecting outliers in streaming time series data from ARM distributed sensors". United States. https://www.osti.gov/servlets/purl/1491320.
@article{osti_1491320,
title = {Detecting outliers in streaming time series data from ARM distributed sensors},
author = {Lu, Yuping and Kumar, Jitendra and Collier, Nathaniel O. and Krishna, Bhargavi and Langston, Michael A.},
abstractNote = {The Atmospheric Radiation Measurement (ARM) Data Center at ORNL collects data from a number of permanent and mobile facilities around the globe. The data is then ingested to create high level scientific products. High frequency streaming measurements from sensors and radar instruments at ARM sites require high degree of accuracy to enable rigorous study of atmospheric processes. Outliers in collected data are common due to instrument failure or extreme weather events. Thus, it is critical to identify and flag them. We employed multiple univariate, multivariate and time series techniques for outlier detection methods and studied their effectiveness. First, we examined Pearson correlation coefficient which is used to measure the pairwise correlations between variables. Singular Spectrum Analysis (SSA) was applied to detect outliers by removing the anticipated annual and seasonal cycles from the signal to accentuate anomalies. K-means was applied for multivariate examination of data from collection of sensors to identify any deviation from expected and known patterns and identify abnormal observation. The Pearson correlation coefficient, SSA and K-means methods were later combined together in a framework to detect outliersthrough a range of checks. We applied the developed method to data from meteorological sensors at ARM Southern Great Plains site and validated against existing database of known data quality issues.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2018},
month = {11}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: