Spatiotemporal modeling of node temperatures in supercomputers
Abstract
Los Alamos National Laboratory (LANL) is home to many large supercomputing clusters. These clusters require an enormous amount of power (~500-2000 kW each), and most of this energy is converted into heat. Thus, cooling the components of the supercomputer becomes a critical and expensive endeavor. Recently a project was initiated to investigate the effect that changes to the cooling system in a machine room had on three large machines that were housed there. Coupled with this goal was the aim to develop a general good-practice for characterizing the effect of cooling changes and monitoring machine node temperatures in this and other machine rooms. This paper focuses on the statistical approach used to quantify the effect that several cooling changes to the room had on the temperatures of the individual nodes of the computers. The largest cluster in the room has 1,600 nodes that run a variety of jobs during general use. Since extremes temperatures are important, a Normal distribution plus generalized Pareto distribution for the upper tail is used to model the marginal distribution, along with a Gaussian process copula to account for spatio-temporal dependence. A Gaussian Markov random field (GMRF) model is used to model the spatial effects onmore »
- Authors:
-
- Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
- North Carolina State Univ., Raleigh, NC (United States)
- Publication Date:
- Research Org.:
- Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 1329590
- Report Number(s):
- LA-UR-15-22229
Journal ID: ISSN 0162-1459
- Grant/Contract Number:
- AC52-06NA25396
- Resource Type:
- Accepted Manuscript
- Journal Name:
- Journal of the American Statistical Association
- Additional Journal Information:
- Journal Name: Journal of the American Statistical Association; Journal ID: ISSN 0162-1459
- Publisher:
- Taylor & Francis
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING; Mathematics
Citation Formats
Storlie, Curtis Byron, Reich, Brian James, Rust, William Newton, Ticknor, Lawrence O., Bonnie, Amanda Marie, Montoya, Andrew J., and Michalak, Sarah E. Spatiotemporal modeling of node temperatures in supercomputers. United States: N. p., 2016.
Web. doi:10.1080/01621459.2016.1195271.
Storlie, Curtis Byron, Reich, Brian James, Rust, William Newton, Ticknor, Lawrence O., Bonnie, Amanda Marie, Montoya, Andrew J., & Michalak, Sarah E. Spatiotemporal modeling of node temperatures in supercomputers. United States. https://doi.org/10.1080/01621459.2016.1195271
Storlie, Curtis Byron, Reich, Brian James, Rust, William Newton, Ticknor, Lawrence O., Bonnie, Amanda Marie, Montoya, Andrew J., and Michalak, Sarah E. Fri .
"Spatiotemporal modeling of node temperatures in supercomputers". United States. https://doi.org/10.1080/01621459.2016.1195271. https://www.osti.gov/servlets/purl/1329590.
@article{osti_1329590,
title = {Spatiotemporal modeling of node temperatures in supercomputers},
author = {Storlie, Curtis Byron and Reich, Brian James and Rust, William Newton and Ticknor, Lawrence O. and Bonnie, Amanda Marie and Montoya, Andrew J. and Michalak, Sarah E.},
abstractNote = {Los Alamos National Laboratory (LANL) is home to many large supercomputing clusters. These clusters require an enormous amount of power (~500-2000 kW each), and most of this energy is converted into heat. Thus, cooling the components of the supercomputer becomes a critical and expensive endeavor. Recently a project was initiated to investigate the effect that changes to the cooling system in a machine room had on three large machines that were housed there. Coupled with this goal was the aim to develop a general good-practice for characterizing the effect of cooling changes and monitoring machine node temperatures in this and other machine rooms. This paper focuses on the statistical approach used to quantify the effect that several cooling changes to the room had on the temperatures of the individual nodes of the computers. The largest cluster in the room has 1,600 nodes that run a variety of jobs during general use. Since extremes temperatures are important, a Normal distribution plus generalized Pareto distribution for the upper tail is used to model the marginal distribution, along with a Gaussian process copula to account for spatio-temporal dependence. A Gaussian Markov random field (GMRF) model is used to model the spatial effects on the node temperatures as the cooling changes take place. This model is then used to assess the condition of the node temperatures after each change to the room. The analysis approach was used to uncover the cause of a problematic episode of overheating nodes on one of the supercomputing clusters. Lastly, this same approach can easily be applied to monitor and investigate cooling systems at other data centers, as well.},
doi = {10.1080/01621459.2016.1195271},
journal = {Journal of the American Statistical Association},
number = ,
volume = ,
place = {United States},
year = {Fri Jun 10 00:00:00 EDT 2016},
month = {Fri Jun 10 00:00:00 EDT 2016}
}
Web of Science
Works referenced in this record:
A close look at the spatial structure implied by the CAR and SAR models
journal, April 2004
- Wall, Melanie M.
- Journal of Statistical Planning and Inference, Vol. 121, Issue 2
Space-time modelling of extreme events
journal, September 2013
- Huser, R.; Davison, A. C.
- Journal of the Royal Statistical Society: Series B (Statistical Methodology), Vol. 76, Issue 2
Hierarchical modeling for extreme values observed over space and time
journal, January 2008
- Sang, Huiyan; Gelfand, Alan E.
- Environmental and Ecological Statistics, Vol. 16, Issue 3
Modeling temporal gradients in regionally aggregated California asthma hospitalization data
journal, March 2013
- Quick, Harrison; Banerjee, Sudipto; Carlin, Bradley P.
- The Annals of Applied Statistics, Vol. 7, Issue 1
Fast sampling of Gaussian Markov random fields
journal, May 2001
- Rue, Havard
- Journal of the Royal Statistical Society: Series B (Statistical Methodology), Vol. 63, Issue 2
LINPACK Benchmark
book, January 2011
- Dongarra, Jack; Luszczek, Piotr; Feautrier, Paul
- Encyclopedia of Parallel Computing
Stationary max-stable fields associated to negative definite functions
journal, September 2009
- Kabluchko, Zakhar; Schlather, Martin; de Haan, Laurens
- The Annals of Probability, Vol. 37, Issue 5
Spatial process modelling for univariate and multivariate dynamic spatial data
journal, January 2005
- Gelfand, Alan E.; Banerjee, Sudipto; Gamerman, Dani
- Environmetrics, Vol. 16, Issue 5
A Hybrid Pareto Mixture for Conditional Asymmetric Fat-Tailed Distributions
journal, July 2009
- Carreau, J.; Bengio, Y.
- IEEE Transactions on Neural Networks, Vol. 20, Issue 7
Spatiotemporal quantile regression for detecting distributional changes in environmental processes
journal, January 2012
- Reich, Brian J.
- Journal of the Royal Statistical Society: Series C (Applied Statistics), Vol. 61, Issue 4
Regression B-spline smoothing in Bayesian disease mapping: with an application to patient safety surveillance
journal, January 2007
- MacNab, Ying C.; Gustafson, Paul
- Statistics in Medicine, Vol. 26, Issue 24
Spatial Analyses of Periodontal Data Using Conditionally Autoregressive Priors Having Two Classes of Neighbor Relations
journal, March 2007
- Reich, Brian J.; Hodges, James S.; Carlin, Bradley P.
- Journal of the American Statistical Association, Vol. 102, Issue 477
Assessment of the Impact of Cosmic-Ray-Induced Neutrons on Hardware in the Roadrunner Supercomputer
journal, June 2012
- Michalak, Sarah E.; DuBois, Andrew J.; Storlie, Curtis B.
- IEEE Transactions on Device and Materials Reliability, Vol. 12, Issue 2
Classes of Nonseparable, Spatio-Temporal Stationary Covariance Functions
journal, December 1999
- Cressie, Noel; Huang, Hsin-Cheng
- Journal of the American Statistical Association, Vol. 94, Issue 448
Estimating the tail-dependence coefficient: Properties and pitfalls
journal, August 2005
- Frahm, Gabriel; Junker, Markus; Schmidt, Rafael
- Insurance: Mathematics and Economics, Vol. 37, Issue 1
Nonseparable, Stationary Covariance Functions for Space–Time Data
journal, June 2002
- Gneiting, Tilmann
- Journal of the American Statistical Association, Vol. 97, Issue 458
A hierarchical max-stable spatial model for extreme precipitation
journal, December 2012
- Reich, Brian J.; Shaby, Benjamin A.
- The Annals of Applied Statistics, Vol. 6, Issue 4
The t Copula and Related Copulas
journal, April 2005
- Demarta, Stefano; McNeil, Alexander J.
- International Statistical Review, Vol. 73, Issue 1
A hybrid Pareto model for asymmetric fat-tailed data: the univariate case
journal, August 2008
- Carreau, Julie; Bengio, Yoshua
- Extremes, Vol. 12, Issue 1
A study of logspline density estimation
journal, November 1991
- Kooperberg, Charles; Stone, Charles J.
- Computational Statistics & Data Analysis, Vol. 12, Issue 3
Efficient inference for spatial extreme value processes associated to log-Gaussian random functions
journal, November 2013
- Wadsworth, J. L.; Tawn, J. A.
- Biometrika, Vol. 101, Issue 1
Gaussian Markov Random Fields: Theory and Applications
book, February 2005
- Rue, Havard; Held, Leonhard
Likelihood-Based Inference for Max-Stable Processes
journal, March 2010
- Padoan, S. A.; Ribatet, M.; Sisson, S. A.
- Journal of the American Statistical Association, Vol. 105, Issue 489
Space–Time Covariance Functions
journal, March 2005
- Stein, Michael L.
- Journal of the American Statistical Association, Vol. 100, Issue 469
A Bayesian Reliability Analysis of Neutron-Induced Errors in High Performance Computing Hardware
journal, June 2013
- Storlie, Curtis B.; Michalak, Sarah E.; Quinn, Heather M.
- Journal of the American Statistical Association, Vol. 108, Issue 502
Hierarchical Spatio-Temporal Mapping of Disease Rates
journal, June 1997
- Waller, Lance A.; Carlin, Bradley P.; Xia, Hong
- Journal of the American Statistical Association, Vol. 92, Issue 438
An Approach to Statistical Spatial-Temporal Modeling of Meteorological Fields
journal, June 1994
- Handcock, Mark S.; Wallis, James R.
- Journal of the American Statistical Association, Vol. 89, Issue 426
Statistical Modeling of Spatial Extremes
journal, May 2012
- Davison, A. C.; Padoan, S. A.; Ribatet, M.
- Statistical Science, Vol. 27, Issue 2
Extreme value analysis for evaluating ozone control strategies
journal, June 2013
- Reich, Brian; Cooley, Daniel; Foley, Kristen
- The Annals of Applied Statistics, Vol. 7, Issue 2
Dependence modelling for spatial extremes
journal, March 2012
- Wadsworth, J. L.; Tawn, J. A.
- Biometrika, Vol. 99, Issue 2
An Approach to Statistical Spatial-Temporal Modeling of Meteorological Fields
journal, June 1994
- Handcock, Mark S.; Wallis, James R.
- Journal of the American Statistical Association, Vol. 89, Issue 426
Hierarchical Spatio-Temporal Mapping of Disease Rates
journal, June 1997
- Waller, Lance A.; Carlin, Bradley P.; Xia, Hong
- Journal of the American Statistical Association, Vol. 92, Issue 438
An Introduction to Copulas
journal, August 2000
- Rayens, Bill; Nelsen, Roger B.
- Technometrics, Vol. 42, Issue 3
Stationary max-stable fields associated to negative definite functions
text, January 2008
- Kabluchko, Zakhar; Schlather, Martin; de Haan, Laurens
- arXiv
A hierarchical max-stable spatial model for extreme precipitation
text, January 2013
- Reich, Brian J.; Shaby, Benjamin A.
- arXiv
Extreme value analysis for evaluating ozone control strategies
text, January 2013
- Reich, Brian; Cooley, Daniel; Foley, Kristen
- arXiv
Works referencing / citing this record:
Modeling material stress using integrated Gaussian Markov random fields
journal, November 2019
- Marcy, Peter W.; Vander Wiel, Scott A.; Storlie, Curtis B.
- Journal of Applied Statistics, Vol. 47, Issue 9
Modeling Material Stress Using Integrated Gaussian Markov Random Fields
text, January 2019
- Marcy, Peter W.; Wiel, Scott A. Vander; Storlie, Curtis B.
- arXiv