Towards High Availability for High-Performance Computing System Services: Accomplishments and Limitations
Conference
·
OSTI ID:931290
- ORNL
- Louisiana Tech University
- Tennessee Technological University
During the last several years, our teams at Oak Ridge National Laboratory, Louisiana Tech University, and Tennessee Technological University focused on efficient redundancy strategies for head and service nodes of high-performance computing (HPC) systems in order to pave the way for high availability (HA) in HPC. These nodes typically run critical HPC system services, like job and resource management, and represent single points of failure and control for an entire HPC system. The overarching goal of our research is to provide high-level reliability, availability, and serviceability (RAS) for HPC systems by combining HA and HPC technology. This paper summarizes our accomplishments, such as developed concepts and implemented proof-of-concept prototypes, and describes existing limitations, such as performance issues, which need to be dealt with for production-type deployment.
- Research Organization:
- Oak Ridge National Laboratory (ORNL)
- Sponsoring Organization:
- ORNL LDRD Director's R&D; SC USDOE - Office of Science (SC)
- DOE Contract Number:
- AC05-00OR22725
- OSTI ID:
- 931290
- Country of Publication:
- United States
- Language:
- English
Similar Records
Symmetric Active/Active High Availability for High-Performance Computing System Services
Symmetric Active/Active High Availability for High-Performance Computing System Services: Accomplishments and Limitations
JOSHUA: Symmetric Active/Active Replication for Highly Available HPC Job and Resource Management
Journal Article
·
Sat Dec 31 23:00:00 EST 2005
· Journal of Computers
·
OSTI ID:978718
Symmetric Active/Active High Availability for High-Performance Computing System Services: Accomplishments and Limitations
Conference
·
Mon Dec 31 23:00:00 EST 2007
·
OSTI ID:945336
JOSHUA: Symmetric Active/Active Replication for Highly Available HPC Job and Resource Management
Conference
·
Sat Dec 31 23:00:00 EST 2005
·
OSTI ID:930763