skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Log Summarization and Anomaly Detection for TroubleshootingDistributed Systems

Abstract

Today's system monitoring tools are capable of detectingsystem failures such as host failures, OS errors, and network partitionsin near-real time. Unfortunately, the same cannot yet be said of theend-to-end distributed softwarestack. Any given action, for example,reliably transferring a directory of files, can involve a wide range ofcomplex and interrelated actions across multiple pieces of software:checking user certificates and permissions, getting details for allfiles, performing third-party transfers, understanding re-try policydecisions, etc. We present an infrastructure for troubleshooting complexmiddleware, a general purpose technique for configurable logsummarization, and an anomaly detection technique that works in near-realtime on running Grid middleware. We present results gathered using thisinfrastructure from instrumented Grid middleware and applications runningon the Emulab testbed. From these results, we analyze the effectivenessof several algorithms at accurately detecting a variety of performanceanomalies.

Authors:
; ; ; ; ;
Publication Date:
Research Org.:
Ernest Orlando Lawrence Berkeley NationalLaboratory, Berkeley, CA (US)
Sponsoring Org.:
USDOE Director. Office of Science. Advanced ScientificComputing Research
OSTI Identifier:
932522
Report Number(s):
LBNL-63468
R&D Project: K11129; BnR: KJ0101030; TRN: US200813%%97
DOE Contract Number:  
DE-AC02-05CH11231
Resource Type:
Conference
Resource Relation:
Conference: IEEE Grid2007, Austin, TX, Sept 20-21,2007
Country of Publication:
United States
Language:
English
Subject:
42; ALGORITHMS; DETECTION; MONITORING; PERFORMANCE; Grid Troubleshooting

Citation Formats

Gunter, Dan, Tierney, Brian L., Brown, Aaron, Swany, Martin, Bresnahan, John, and Schopf, Jennifer M. Log Summarization and Anomaly Detection for TroubleshootingDistributed Systems. United States: N. p., 2007. Web.
Gunter, Dan, Tierney, Brian L., Brown, Aaron, Swany, Martin, Bresnahan, John, & Schopf, Jennifer M. Log Summarization and Anomaly Detection for TroubleshootingDistributed Systems. United States.
Gunter, Dan, Tierney, Brian L., Brown, Aaron, Swany, Martin, Bresnahan, John, and Schopf, Jennifer M. Wed . "Log Summarization and Anomaly Detection for TroubleshootingDistributed Systems". United States. https://www.osti.gov/servlets/purl/932522.
@article{osti_932522,
title = {Log Summarization and Anomaly Detection for TroubleshootingDistributed Systems},
author = {Gunter, Dan and Tierney, Brian L. and Brown, Aaron and Swany, Martin and Bresnahan, John and Schopf, Jennifer M.},
abstractNote = {Today's system monitoring tools are capable of detectingsystem failures such as host failures, OS errors, and network partitionsin near-real time. Unfortunately, the same cannot yet be said of theend-to-end distributed softwarestack. Any given action, for example,reliably transferring a directory of files, can involve a wide range ofcomplex and interrelated actions across multiple pieces of software:checking user certificates and permissions, getting details for allfiles, performing third-party transfers, understanding re-try policydecisions, etc. We present an infrastructure for troubleshooting complexmiddleware, a general purpose technique for configurable logsummarization, and an anomaly detection technique that works in near-realtime on running Grid middleware. We present results gathered using thisinfrastructure from instrumented Grid middleware and applications runningon the Emulab testbed. From these results, we analyze the effectivenessof several algorithms at accurately detecting a variety of performanceanomalies.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2007},
month = {8}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: