You need JavaScript to view this

The event notification and alarm system for the Open Science Grid operations center

Abstract

The Open Science Grid Operations (OSG) Team operates a distributed set of services and tools that enable the utilization of the OSG by several HEP projects. Without these services users of the OSG would not be able to run jobs, locate resources, obtain information about the status of systems or generally use the OSG. For this reason these services must be highly available. This paper describes the automated monitoring and notification systems used to diagnose and report problems. Described here are the means used by OSG Operations to monitor systems such as physical facilities, network operations, server health, service availability and software error events. Once detected, an error condition generates a message sent to, for example, Email, SMS, Twitter, an Instant Message Server, etc. The mechanism being developed to integrate these monitoring systems into a prioritized and configurable alarming system is emphasized.
Authors:
Hayashi, S; Teige and, S; Quick, R [1] 
  1. Indiana University, University Information Technology Services (United States)
Publication Date:
Dec 13, 2012
Product Type:
Journal Article
Resource Relation:
Journal Name: Journal of Physics. Conference Series (Online); Journal Volume: 396; Journal Issue: 3; Conference: CHEP2012: International conference on computing in high energy and nuclear physics 2012, New York, NY (United States), 21-25 May 2012; Other Information: Country of input: International Atomic Energy Agency (IAEA)
Subject:
46 INSTRUMENTATION RELATED TO NUCLEAR SCIENCE AND TECHNOLOGY; 97 MATHEMATICAL METHODS AND COMPUTING; ALARM SYSTEMS; COLLIDING BEAMS; COMPUTER CALCULATIONS; COMPUTER CODES; COMPUTER NETWORKS; DATA ACQUISITION; DATA TRANSMISSION; DISTRIBUTED DATA PROCESSING; ERRORS; MONITORING; MULTIPARTICLE SPECTROMETERS; MULTIPLE PRODUCTION; PARALLEL PROCESSING; PARTICLE IDENTIFICATION
OSTI ID:
22079316
Country of Origin:
United Kingdom
Language:
English
Other Identifying Numbers:
Journal ID: ISSN 1742-6596; TRN: GB13O2444038111
Availability:
Available from http://dx.doi.org/10.1088/1742-6596/396/3/032105
Submitting Site:
INIS
Size:
[5 page(s)]
Announcement Date:
Apr 04, 2013

Citation Formats

Hayashi, S, Teige and, S, and Quick, R. The event notification and alarm system for the Open Science Grid operations center. United Kingdom: N. p., 2012. Web. doi:10.1088/1742-6596/396/3/032105.
Hayashi, S, Teige and, S, & Quick, R. The event notification and alarm system for the Open Science Grid operations center. United Kingdom. doi:10.1088/1742-6596/396/3/032105.
Hayashi, S, Teige and, S, and Quick, R. 2012. "The event notification and alarm system for the Open Science Grid operations center." United Kingdom. doi:10.1088/1742-6596/396/3/032105. https://www.osti.gov/servlets/purl/10.1088/1742-6596/396/3/032105.
@misc{etde_22079316,
title = {The event notification and alarm system for the Open Science Grid operations center}
author = {Hayashi, S, Teige and, S, and Quick, R}
abstractNote = {The Open Science Grid Operations (OSG) Team operates a distributed set of services and tools that enable the utilization of the OSG by several HEP projects. Without these services users of the OSG would not be able to run jobs, locate resources, obtain information about the status of systems or generally use the OSG. For this reason these services must be highly available. This paper describes the automated monitoring and notification systems used to diagnose and report problems. Described here are the means used by OSG Operations to monitor systems such as physical facilities, network operations, server health, service availability and software error events. Once detected, an error condition generates a message sent to, for example, Email, SMS, Twitter, an Instant Message Server, etc. The mechanism being developed to integrate these monitoring systems into a prioritized and configurable alarming system is emphasized.}
doi = {10.1088/1742-6596/396/3/032105}
journal = {Journal of Physics. Conference Series (Online)}
issue = {3}
volume = {396}
journal type = {AC}
place = {United Kingdom}
year = {2012}
month = {Dec}
}