skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Spatiotemporal modeling of node temperatures in supercomputers

Journal Article · · Journal of the American Statistical Association
 [1];  [2];  [1];  [1];  [1];  [1];  [1]
  1. Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
  2. North Carolina State Univ., Raleigh, NC (United States)

Los Alamos National Laboratory (LANL) is home to many large supercomputing clusters. These clusters require an enormous amount of power (~500-2000 kW each), and most of this energy is converted into heat. Thus, cooling the components of the supercomputer becomes a critical and expensive endeavor. Recently a project was initiated to investigate the effect that changes to the cooling system in a machine room had on three large machines that were housed there. Coupled with this goal was the aim to develop a general good-practice for characterizing the effect of cooling changes and monitoring machine node temperatures in this and other machine rooms. This paper focuses on the statistical approach used to quantify the effect that several cooling changes to the room had on the temperatures of the individual nodes of the computers. The largest cluster in the room has 1,600 nodes that run a variety of jobs during general use. Since extremes temperatures are important, a Normal distribution plus generalized Pareto distribution for the upper tail is used to model the marginal distribution, along with a Gaussian process copula to account for spatio-temporal dependence. A Gaussian Markov random field (GMRF) model is used to model the spatial effects on the node temperatures as the cooling changes take place. This model is then used to assess the condition of the node temperatures after each change to the room. The analysis approach was used to uncover the cause of a problematic episode of overheating nodes on one of the supercomputing clusters. Lastly, this same approach can easily be applied to monitor and investigate cooling systems at other data centers, as well.

Research Organization:
Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
Sponsoring Organization:
USDOE
Grant/Contract Number:
AC52-06NA25396
OSTI ID:
1329590
Report Number(s):
LA-UR-15-22229
Journal Information:
Journal of the American Statistical Association, Journal Name: Journal of the American Statistical Association; ISSN 0162-1459
Publisher:
Taylor & FrancisCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 2 works
Citation information provided by
Web of Science

References (37)

A close look at the spatial structure implied by the CAR and SAR models journal April 2004
Space-time modelling of extreme events journal September 2013
Hierarchical modeling for extreme values observed over space and time journal January 2008
Modeling temporal gradients in regionally aggregated California asthma hospitalization data journal March 2013
Fast sampling of Gaussian Markov random fields journal May 2001
LINPACK Benchmark book January 2011
Stationary max-stable fields associated to negative definite functions journal September 2009
Spatial process modelling for univariate and multivariate dynamic spatial data journal January 2005
A Hybrid Pareto Mixture for Conditional Asymmetric Fat-Tailed Distributions journal July 2009
Spatiotemporal quantile regression for detecting distributional changes in environmental processes journal January 2012
An Introduction to Copulas book January 1999
Regression B-spline smoothing in Bayesian disease mapping: with an application to patient safety surveillance journal January 2007
Spatial Analyses of Periodontal Data Using Conditionally Autoregressive Priors Having Two Classes of Neighbor Relations journal March 2007
Assessment of the Impact of Cosmic-Ray-Induced Neutrons on Hardware in the Roadrunner Supercomputer journal June 2012
Classes of Nonseparable, Spatio-Temporal Stationary Covariance Functions journal December 1999
Estimating the tail-dependence coefficient: Properties and pitfalls journal August 2005
Nonseparable, Stationary Covariance Functions for Space–Time Data journal June 2002
A hierarchical max-stable spatial model for extreme precipitation journal December 2012
The t Copula and Related Copulas journal April 2005
A hybrid Pareto model for asymmetric fat-tailed data: the univariate case journal August 2008
A study of logspline density estimation journal November 1991
Efficient inference for spatial extreme value processes associated to log-Gaussian random functions journal November 2013
Gaussian Markov Random Fields: Theory and Applications book February 2005
Likelihood-Based Inference for Max-Stable Processes journal March 2010
Space–Time Covariance Functions journal March 2005
A Bayesian Reliability Analysis of Neutron-Induced Errors in High Performance Computing Hardware journal June 2013
Hierarchical Spatio-Temporal Mapping of Disease Rates journal June 1997
An Approach to Statistical Spatial-Temporal Modeling of Meteorological Fields journal June 1994
Statistical Modeling of Spatial Extremes journal May 2012
Extreme value analysis for evaluating ozone control strategies journal June 2013
Dependence modelling for spatial extremes journal March 2012
An Approach to Statistical Spatial-Temporal Modeling of Meteorological Fields journal June 1994
Hierarchical Spatio-Temporal Mapping of Disease Rates journal June 1997
An Introduction to Copulas journal August 2000
Stationary max-stable fields associated to negative definite functions text January 2008
A hierarchical max-stable spatial model for extreme precipitation text January 2013
Extreme value analysis for evaluating ozone control strategies text January 2013

Cited By (2)

Modeling material stress using integrated Gaussian Markov random fields journal November 2019
Modeling Material Stress Using Integrated Gaussian Markov Random Fields text January 2019