Generating HPC Job Profiles and Expectations with Time-Series Data - Showcase Presentation
- Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
Summary: Job Profiles and Expectations provide important insights into workloads (Job Profile: window into how a job is running; Job Expectation: Is that job behaving as expected; Provides us with actionable information). Machine learning can be used to group job runs into workload types (Identified groups can then be used for generate expectations); Profiles and Expectations also enable the study of: System-wide events, tracking system changes; System resource utilization and scheduling; Marking log data for further investigation or failures/anomalies.
- Research Organization:
- Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC). Advanced Scientific Computing Research (ASCR) (SC-21); USDOE National Nuclear Security Administration (NNSA)
- DOE Contract Number:
- 89233218CNA000001
- OSTI ID:
- 1645050
- Report Number(s):
- LA-UR--20-25734
- Country of Publication:
- United States
- Language:
- English
Similar Records
Generic and ML Workloads in an HPC Datacenter: Node Energy, Job Failures, and Node-Job Analysis
RLScheduler: An Automated HPC Batch Job Scheduler Using Reinforcement Learning
Power Profile Monitoring and Tracking Evolution of System-Wide HPC Workloads
Conference
·
Sat Nov 30 23:00:00 EST 2024
·
OSTI ID:2500377
RLScheduler: An Automated HPC Batch Job Scheduler Using Reinforcement Learning
Conference
·
Sun Nov 01 00:00:00 EDT 2020
·
OSTI ID:1777791
Power Profile Monitoring and Tracking Evolution of System-Wide HPC Workloads
Conference
·
Sun Jun 30 20:00:00 EDT 2024
·
OSTI ID:2439873