skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Statistical Techniques For Real-time Anomaly Detection Using Spark Over Multi-source VMware Performance Data

Journal Article · · Sandia journal manuscript; Not yet accepted for publication
OSTI ID:1427266
 [1];  [1];  [1];  [1];  [2]
  1. Univ. of Texas-Dallas, Richardson, TX (United States)
  2. Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

Anomaly detection refers to the identi cation of an irregular or unusual pat- tern which deviates from what is standard, normal, or expected. Such deviated patterns typically correspond to samples of interest and are assigned different labels in different domains, such as outliers, anomalies, exceptions, or malware. Detecting anomalies in fast, voluminous streams of data is a formidable chal- lenge. This paper presents a novel, generic, real-time distributed anomaly detection framework for heterogeneous streaming data where anomalies appear as a group. We have developed a distributed statistical approach to build a model and later use it to detect anomaly. As a case study, we investigate group anomaly de- tection for a VMware-based cloud data center, which maintains a large number of virtual machines (VMs). We have built our framework using Apache Spark to get higher throughput and lower data processing time on streaming data. We have developed a window-based statistical anomaly detection technique to detect anomalies that appear sporadically. We then relaxed this constraint with higher accuracy by implementing a cluster-based technique to detect sporadic and continuous anomalies. We conclude that our cluster-based technique out- performs other statistical techniques with higher accuracy and lower processing time.

Research Organization:
Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA)
DOE Contract Number:
AC04-94AL85000
OSTI ID:
1427266
Report Number(s):
SAND-2015-8150J; 604082
Journal Information:
Sandia journal manuscript; Not yet accepted for publication, Journal Name: Sandia journal manuscript; Not yet accepted for publication; ISSN 9999-0014
Publisher:
Sandia
Country of Publication:
United States
Language:
English

Similar Records

Online anomaly detection for multiā€source VMware using a distributed streaming framework
Journal Article · Mon Jan 11 00:00:00 EST 2016 · Software, Practice and Experience · OSTI ID:1427266

The Local Variational Multiscale Method for Turbulence Simulation.
Technical Report · Sun May 01 00:00:00 EDT 2005 · OSTI ID:1427266

Online Diagnosis of Performance Variation in HPC Systems Using Machine Learning
Journal Article · Fri Sep 14 00:00:00 EDT 2018 · IEEE Transactions on Parallel and Distributed Systems · OSTI ID:1427266