skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Exploring Process Groups for Reliability, Availability and Serviceability of Terascale Computing Systems

Abstract

This paper presents various aspects of reliability, availability and serviceability (RAS) systems as they relate to group communication service, including reliable and total order multicast/broadcast, virtual synchrony, and failure detection. While the issue of availability, particularly high availability using replication-based architectures has recently received upsurge research interests, much still have to be done in understanding the basic underlying concepts for achieving RAS systems, especially in high-end and high performance computing (HPC) communities. Various attributes of group communication service and the prototype of symmetric active replication following ideas utilized in the Newtop protocol will be discussed. We explore the application of group communication service for RAS HPC, laying the groundwork for its integrated model.

Authors:
 [1];  [1];  [1]
  1. ORNL
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE Laboratory Directed Research and Development (LDRD) Program
OSTI Identifier:
989650
DOE Contract Number:
AC05-00OR22725
Resource Type:
Conference
Resource Relation:
Conference: 2nd International Conference on Computer Science and Information Systems 2006, Athens, Greece, 20060619, 20060621
Country of Publication:
United States
Language:
English

Citation Formats

Okunbor, Daniel Irowa, Engelmann, Christian, and Scott, Steven L. Exploring Process Groups for Reliability, Availability and Serviceability of Terascale Computing Systems. United States: N. p., 2006. Web.
Okunbor, Daniel Irowa, Engelmann, Christian, & Scott, Steven L. Exploring Process Groups for Reliability, Availability and Serviceability of Terascale Computing Systems. United States.
Okunbor, Daniel Irowa, Engelmann, Christian, and Scott, Steven L. Sun . "Exploring Process Groups for Reliability, Availability and Serviceability of Terascale Computing Systems". United States. doi:.
@article{osti_989650,
title = {Exploring Process Groups for Reliability, Availability and Serviceability of Terascale Computing Systems},
author = {Okunbor, Daniel Irowa and Engelmann, Christian and Scott, Steven L},
abstractNote = {This paper presents various aspects of reliability, availability and serviceability (RAS) systems as they relate to group communication service, including reliable and total order multicast/broadcast, virtual synchrony, and failure detection. While the issue of availability, particularly high availability using replication-based architectures has recently received upsurge research interests, much still have to be done in understanding the basic underlying concepts for achieving RAS systems, especially in high-end and high performance computing (HPC) communities. Various attributes of group communication service and the prototype of symmetric active replication following ideas utilized in the Newtop protocol will be discussed. We explore the application of group communication service for RAS HPC, laying the groundwork for its integrated model.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Sun Jan 01 00:00:00 EST 2006},
month = {Sun Jan 01 00:00:00 EST 2006}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: