Symmetric Active/Active High Availability for High-Performance Computing System Services: Accomplishments and Limitations
Conference
·
OSTI ID:945336
- ORNL
- Louisiana Tech University
- Tennessee Technological University
This paper summarizes our efforts over the last 3-4 years in providing symmetric active/active high availability for high-performance computing (HPC) system services. This work paves the way for high-level reliability, availability and serviceability in extreme-scale HPC systems by focusing on the most critical components, head and service nodes, and by reinforcing them with appropriate high availability solutions. This paper presents our accomplishments in the form of concepts and respective prototypes, discusses existing limitations, outlines possible future work, and describes the relevance of this research to other, planned efforts.
- Research Organization:
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC)
- DOE Contract Number:
- DE-AC05-00OR22725
- OSTI ID:
- 945336
- Resource Relation:
- Conference: 8th IEEE International Symposium on Cluster Computing and the Grid (CCGrid) 2008, Lyon, France, 20080519, 20080522
- Country of Publication:
- United States
- Language:
- English
Similar Records
Towards High Availability for High-Performance Computing System Services: Accomplishments and Limitations
Symmetric Active/Active High Availability for High-Performance Computing System Services
JOSHUA: Symmetric Active/Active Replication for Highly Available HPC Job and Resource Management
Conference
·
Sun Jan 01 00:00:00 EST 2006
·
OSTI ID:945336
+1 more
Symmetric Active/Active High Availability for High-Performance Computing System Services
Journal Article
·
Sun Jan 01 00:00:00 EST 2006
· Journal of Computers
·
OSTI ID:945336
+1 more
JOSHUA: Symmetric Active/Active Replication for Highly Available HPC Job and Resource Management
Conference
·
Sun Jan 01 00:00:00 EST 2006
·
OSTI ID:945336