Monitoring computational clusters with OVIS.
Traditional cluster monitoring approaches consider nodes in singleton, using manufacturer-specified extreme limits as thresholds for failure ''prediction''. We have developed a tool, OVIS, for monitoring and analysis of large computational platforms which, instead, uses a statistical approach to characterize single device behaviors from those of a large number of statistically similar devices. Baseline capabilities of OVIS include the visual display of deterministic information about state variables (e.g., temperature, CPU utilization, fan speed) and their aggregate statistics. Visual consideration of the cluster as a comparative ensemble, rather than as singleton nodes, is an easy and useful method for tuning cluster configuration and determining effects of real-time changes.
- Research Organization:
- Sandia National Laboratories (SNL), Albuquerque, NM, and Livermore, CA (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- AC04-94AL85000
- OSTI ID:
- 899078
- Report Number(s):
- SAND2006-7939; TRN: US200708%%24
- Country of Publication:
- United States
- Language:
- English
Similar Records
OVIS 2.0 user%3CU%2B2019%3Es guide.
OVIS 3.2 user's guide.