Online anomaly detection for multi‐source VMware using a distributed streaming framework
- Department of Computer Science The University of Texas at Dallas Dallas TX 75080 USA
- Sandia National Laboratories Albuquerque NM 87123 USA
- Department of Business Istanbul Medeniyet University Istanbul Turkey
Anomaly detection refers to the identification of patterns in a dataset that do not conform to expected patterns. Such non‐conformant patterns typically correspond to samples of interest and are assigned to different labels in different domains, such as outliers, anomalies, exceptions, and malware. A daunting challenge is to detect anomalies in rapid voluminous streams of data.
This paper presents a novel, generic real‐time distributed anomaly detection framework for multi‐source stream data. As a case study, we investigate anomaly detection for a multi‐source VMware‐based cloud data center, which maintains a large number of virtual machines (VMs). This framework continuously monitors VMware performance stream data related to CPU statistics (e.g., load and usage). It collects data simultaneously from all of the VMs connected to the network and notifies the resource manager to reschedule its CPU resources dynamically when it identifies any abnormal behavior from its collected data. A semi‐supervised clustering technique is used to build a model from benign training data only. During testing, if a data instance deviates significantly from the model, then it is flagged as an anomaly.
Effective anomaly detection in this case demands a distributed framework with high throughput and low latency. Distributed streaming frameworks like Apache Storm, Apache Spark, S4, and others are designed for a lower data processing time and a higher throughput than standard centralized frameworks. We have experimentally compared the average processing latency of a tuple during clustering and prediction in both Spark and Storm and demonstrated that Spark processes a tuple much quicker than storm on average. Copyright © 2016 John Wiley & Sons, Ltd.
- Sponsoring Organization:
- USDOE
- Grant/Contract Number:
- AC04-94AL85000
- OSTI ID:
- 1400533
- Journal Information:
- Software, Practice and Experience, Journal Name: Software, Practice and Experience Journal Issue: 11 Vol. 46; ISSN 0038-0644
- Publisher:
- Wiley Blackwell (John Wiley & Sons)Copyright Statement
- Country of Publication:
- United Kingdom
- Language:
- English
AnyOut: Anytime Outlier Detection on Streaming Data
|
book | January 2012 |
The Different Types of Alkaloids in Coca
|
book | January 2009 |
Online anomaly detection for sensor systems: A simple and efficient approach
|
journal | November 2010 |
Anomaly detection in online social networks
|
journal | October 2014 |
B-dids: Mining anomalies in a Big-distributed Intrusion Detection System
|
conference | October 2014 |
Contextual Anomaly Detection in Big Sensor Data
|
conference | June 2014 |
Spark-based anomaly detection over multi-source VMware performance data in real-time
|
conference | December 2014 |
Tackling the Big Data 4 vs for anomaly detection
|
conference | April 2014 |
Real-time anomaly detection over VMware performance data using storm
|
conference | August 2014 |
Incremental Clustering and Dynamic Information Retrieval
|
journal | January 2004 |
MapReduce: simplified data processing on large clusters
|
journal | January 2008 |
Bigtable: A Distributed Storage System for Structured Data
|
journal | June 2008 |
Benchmarking cloud serving systems with YCSB
|
conference | January 2010 |
A scalable, non-parametric anomaly detection framework for Hadoop
|
conference | January 2013 |
Evolving Stream Classification using Change Detection
|
conference | January 2014 |
Similar Records
Fault-Tolerant and Elastic Streaming MapReduce with Decentralized Coordination
DECA: scalable XHMM exome copy-number variant calling with ADAM and Apache Spark