Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Online anomaly detection for multi‐source VMware using a distributed streaming framework

Journal Article · · Software, Practice and Experience
DOI:https://doi.org/10.1002/spe.2390· OSTI ID:1400533
 [1];  [1];  [1];  [1];  [2];  [3]
  1. Department of Computer Science The University of Texas at Dallas Dallas TX 75080 USA
  2. Sandia National Laboratories Albuquerque NM 87123 USA
  3. Department of Business Istanbul Medeniyet University Istanbul Turkey
Summary

Anomaly detection refers to the identification of patterns in a dataset that do not conform to expected patterns. Such non‐conformant patterns typically correspond to samples of interest and are assigned to different labels in different domains, such as outliers, anomalies, exceptions, and malware. A daunting challenge is to detect anomalies in rapid voluminous streams of data.

This paper presents a novel, generic real‐time distributed anomaly detection framework for multi‐source stream data. As a case study, we investigate anomaly detection for a multi‐source VMware‐based cloud data center, which maintains a large number of virtual machines (VMs). This framework continuously monitors VMware performance stream data related to CPU statistics (e.g., load and usage). It collects data simultaneously from all of the VMs connected to the network and notifies the resource manager to reschedule its CPU resources dynamically when it identifies any abnormal behavior from its collected data. A semi‐supervised clustering technique is used to build a model from benign training data only. During testing, if a data instance deviates significantly from the model, then it is flagged as an anomaly.

Effective anomaly detection in this case demands a distributed framework with high throughput and low latency. Distributed streaming frameworks like Apache Storm, Apache Spark, S4, and others are designed for a lower data processing time and a higher throughput than standard centralized frameworks. We have experimentally compared the average processing latency of a tuple during clustering and prediction in both Spark and Storm and demonstrated that Spark processes a tuple much quicker than storm on average. Copyright © 2016 John Wiley & Sons, Ltd.

Sponsoring Organization:
USDOE
Grant/Contract Number:
AC04-94AL85000
OSTI ID:
1400533
Journal Information:
Software, Practice and Experience, Journal Name: Software, Practice and Experience Journal Issue: 11 Vol. 46; ISSN 0038-0644
Publisher:
Wiley Blackwell (John Wiley & Sons)Copyright Statement
Country of Publication:
United Kingdom
Language:
English

References (15)

AnyOut: Anytime Outlier Detection on Streaming Data
  • Assent, Ira; Kranen, Philipp; Baldauf, Corinna
  • Database Systems for Advanced Applications: 17th International Conference, DASFAA 2012, Busan, South Korea, April 15-18, 2012, Proceedings, Part I https://doi.org/10.1007/978-3-642-29038-1_18
book January 2012
The Different Types of Alkaloids in Coca book January 2009
Online anomaly detection for sensor systems: A simple and efficient approach journal November 2010
Anomaly detection in online social networks journal October 2014
B-dids: Mining anomalies in a Big-distributed Intrusion Detection System conference October 2014
Contextual Anomaly Detection in Big Sensor Data conference June 2014
Spark-based anomaly detection over multi-source VMware performance data in real-time conference December 2014
Tackling the Big Data 4 vs for anomaly detection
  • Camacho, Jose; Macia-Fernandez, Gabriel; Diaz-Verdejo, Jesus
  • IEEE INFOCOM 2014 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), 2014 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS) https://doi.org/10.1109/INFCOMW.2014.6849282
conference April 2014
Real-time anomaly detection over VMware performance data using storm
  • Solaimani, Mohiuddin; Khan, Latifur; Thuraisingham, Bhavani
  • 2014 IEEE International Conference on Information Reuse and Integration (IRI), Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration (IEEE IRI 2014) https://doi.org/10.1109/IRI.2014.7051925
conference August 2014
Incremental Clustering and Dynamic Information Retrieval journal January 2004
MapReduce: simplified data processing on large clusters journal January 2008
Bigtable: A Distributed Storage System for Structured Data journal June 2008
Benchmarking cloud serving systems with YCSB conference January 2010
A scalable, non-parametric anomaly detection framework for Hadoop conference January 2013
Evolving Stream Classification using Change Detection conference January 2014

Similar Records

Statistical Techniques For Real-time Anomaly Detection Using Spark Over Multi-source VMware Performance Data
Journal Article · Tue Sep 01 00:00:00 EDT 2015 · Sandia journal manuscript; Not yet accepted for publication · OSTI ID:1427266

Fault-Tolerant and Elastic Streaming MapReduce with Decentralized Coordination
Conference · Mon Jun 29 00:00:00 EDT 2015 · OSTI ID:1332339

DECA: scalable XHMM exome copy-number variant calling with ADAM and Apache Spark
Journal Article · Thu Oct 10 20:00:00 EDT 2019 · BMC Bioinformatics · OSTI ID:1618535

Related Subjects