Statistical Techniques For Real-time Anomaly Detection Using Spark Over Multi-source VMware Performance Data
- Univ. of Texas-Dallas, Richardson, TX (United States)
- Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Anomaly detection refers to the identi cation of an irregular or unusual pat- tern which deviates from what is standard, normal, or expected. Such deviated patterns typically correspond to samples of interest and are assigned different labels in different domains, such as outliers, anomalies, exceptions, or malware. Detecting anomalies in fast, voluminous streams of data is a formidable chal- lenge. This paper presents a novel, generic, real-time distributed anomaly detection framework for heterogeneous streaming data where anomalies appear as a group. We have developed a distributed statistical approach to build a model and later use it to detect anomaly. As a case study, we investigate group anomaly de- tection for a VMware-based cloud data center, which maintains a large number of virtual machines (VMs). We have built our framework using Apache Spark to get higher throughput and lower data processing time on streaming data. We have developed a window-based statistical anomaly detection technique to detect anomalies that appear sporadically. We then relaxed this constraint with higher accuracy by implementing a cluster-based technique to detect sporadic and continuous anomalies. We conclude that our cluster-based technique out- performs other statistical techniques with higher accuracy and lower processing time.
- Research Organization:
- Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
- Sponsoring Organization:
- USDOE National Nuclear Security Administration (NNSA)
- DOE Contract Number:
- AC04-94AL85000
- OSTI ID:
- 1427266
- Report Number(s):
- SAND-2015-8150J; 604082
- Journal Information:
- Sandia journal manuscript; Not yet accepted for publication, Journal Name: Sandia journal manuscript; Not yet accepted for publication; ISSN 9999-0014
- Publisher:
- Sandia
- Country of Publication:
- United States
- Language:
- English
Similar Records
The Local Variational Multiscale Method for Turbulence Simulation.
Online Diagnosis of Performance Variation in HPC Systems Using Machine Learning