A Conceptual Framework for HPC Operational Data Analytics
- LRZ
- ORNL
- Leibniz Supercomputing Centre
- Hewlett Packard Enterprise
- Energy Efficient HPC Working Group
This paper provides a broad framework for under- standing trends in Operational Data Analytics (ODA) for High- Performance Computing (HPC) facilities. The goal of ODA is to allow for the continuous monitoring, archiving, and analysis of near real-time performance data, providing immediately actionable information for multiple operational uses. In this work, we combine two models to provide a comprehensive HPC ODA framework: one is an evolutionary model of analytics capabilities that consists of four types, which are descriptive, diagnostic, predictive and prescriptive, while the other is a four- pillar model for energy-efficient HPC operations that covers facility, system hardware, system software, and applications. This new framework is then overlaid with a description of current development and production deployments of ODA within leading- edge HPC facilities. Finally, we perform a comprehensive survey of ODA works and classify them according to our framework, in order to demonstrate its effectiveness.
- Research Organization:
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
- DOE Contract Number:
- AC05-00OR22725
- OSTI ID:
- 1820791
- Resource Relation:
- Conference: Energy Efficient HPC State of the Practice Workshop 2021 (CLUSTER 2021) - Portland, Oregon, United States of America - 9/7/2021 12:00:00 PM-9/7/2021 12:00:00 PM
- Country of Publication:
- United States
- Language:
- English
Similar Records
Global Experiences with HPC Operational Data Measurement, Collection and Analysis
Distributed Logical Analytic Domains (DLADs) for Distributed HPC Security