Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Towards a specification for measuring red storm reliability, availability, and serviceability (RAS).

Conference ·
OSTI ID:972870

The absence of agreed definitions and metrics for supercomputer RAS obscures meaningful discussion of the issues involved, hinders their solution, and increases total system cost. Seeking to foster a common basis for communication about supercomputer RAS, [1] proposed a general system state model, definitions, and measurements based on the SEMI-E10 specification [2] used in the semiconductor manufacturing industry. This document enumerates the platform-specific details necessary to apply that general framework to the Red Storm system at Sandia National Laboratories. Familiarity with [1] is a strong prerequisite for understanding of this document, as is familiarity with the Red Storm RAS subsystem (although to a much lesser degree). Given the current pre-production status of Red Storm, this document does not specify actual policy or practice, but rather proposes a framework by which to measure RAS performance on Red Storm.

Research Organization:
Sandia National Laboratories
Sponsoring Organization:
USDOE
DOE Contract Number:
AC04-94AL85000
OSTI ID:
972870
Report Number(s):
SAND2005-3018C
Country of Publication:
United States
Language:
English