Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Meaningful statistical analysis of large computational clusters.

Technical Report ·
DOI:https://doi.org/10.2172/958384· OSTI ID:958384

Effective monitoring of large computational clusters demands the analysis of a vast amount of raw data from a large number of machines. The fundamental interactions of the system are not, however, well-defined, making it difficult to draw meaningful conclusions from this data, even if one were able to efficiently handle and process it. In this paper we show that computational clusters, because they are comprised of a large number of identical machines, behave in a statistically meaningful fashion. We therefore can employ normal statistical methods to derive information about individual systems and their environment and to detect problems sooner than with traditional mechanisms. We discuss design details necessary to use these methods on a large system in a timely and low-impact fashion.

Research Organization:
Sandia National Laboratories
Sponsoring Organization:
USDOE
DOE Contract Number:
AC04-94AL85000
OSTI ID:
958384
Report Number(s):
SAND2005-4558
Country of Publication:
United States
Language:
English