Monitoring Cray Cooling Systems
While sites generally have systems in place to monitor the health of Cray computers themselves, often the cooling systems are ignored until a computer failure requires investigation into the source of the failure. The Liebert XDP units used to cool the Cray XE/XK models as well as the Cray proprietary cooling system used for the Cray XC30 models provide data useful for health monitoring. Unfortunately, this valuable information is often available only to custom solutions not accessible by a center-wide monitoring system or is simply ignored entirely. In this paper, methods and tools used to harvest the monitoring data available are discussed, and the implementation needed to integrate the data into a center-wide monitoring system at the Oak Ridge National Laboratory is provided.
- Publication Date:
- OSTI Identifier:
- DOE Contract Number:
- Resource Type:
- Resource Relation:
- Conference: Cray User Group, Lugano, Switzerland, 20140505, 20140508
- Research Org:
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)
- Sponsoring Org:
- USDOE Office of Science (SC)
- Country of Publication:
- United States
Enter terms in the toolbar above to search the full text of this document for pages containing specific keywords.