Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

A Conceptual Framework for HPC Operational Data Analytics

Conference ·
This paper provides a broad framework for under- standing trends in Operational Data Analytics (ODA) for High- Performance Computing (HPC) facilities. The goal of ODA is to allow for the continuous monitoring, archiving, and analysis of near real-time performance data, providing immediately actionable information for multiple operational uses. In this work, we combine two models to provide a comprehensive HPC ODA framework: one is an evolutionary model of analytics capabilities that consists of four types, which are descriptive, diagnostic, predictive and prescriptive, while the other is a four- pillar model for energy-efficient HPC operations that covers facility, system hardware, system software, and applications. This new framework is then overlaid with a description of current development and production deployments of ODA within leading- edge HPC facilities. Finally, we perform a comprehensive survey of ODA works and classify them according to our framework, in order to demonstrate its effectiveness.
Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE; USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)
DOE Contract Number:
AC05-00OR22725
OSTI ID:
1820791
Country of Publication:
United States
Language:
English

Similar Records

Global Experiences with HPC Operational Data Measurement, Collection and Analysis
Conference · Tue Sep 01 00:00:00 EDT 2020 · OSTI ID:1706258

Navigating Exascale Operational Data Analytics: From Inundation to Insight
Conference · Fri Nov 01 00:00:00 EDT 2024 · OSTI ID:2538413

Autonomy Loops for Monitoring, Operational Data Analytics, Feedback, and Response in HPC Operations
Conference · Tue Nov 21 23:00:00 EST 2023 · OSTI ID:2282729

Related Subjects