skip to main content

SciTech ConnectSciTech Connect

Title: Monitoring Cray Cooling Systems

While sites generally have systems in place to monitor the health of Cray computers themselves, often the cooling systems are ignored until a computer failure requires investigation into the source of the failure. The Liebert XDP units used to cool the Cray XE/XK models as well as the Cray proprietary cooling system used for the Cray XC30 models provide data useful for health monitoring. Unfortunately, this valuable information is often available only to custom solutions not accessible by a center-wide monitoring system or is simply ignored entirely. In this paper, methods and tools used to harvest the monitoring data available are discussed, and the implementation needed to integrate the data into a center-wide monitoring system at the Oak Ridge National Laboratory is provided.
 [1] ;  [1] ;  [2] ;  [1] ;  [1]
  1. ORNL
  2. Cray, Inc.
Publication Date:
OSTI Identifier:
DOE Contract Number:
Resource Type:
Resource Relation:
Conference: Cray User Group, Lugano, Switzerland, 20140505, 20140508
Research Org:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)
Sponsoring Org:
USDOE Office of Science (SC)
Country of Publication:
United States