Towards understanding HPC users and systems: A NERSC case study
Journal Article
·
· Journal of Parallel and Distributed Computing
- Umeå University (Sweden); Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
- Umeå University (Sweden)
- Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
High performance computing (HPC) scheduling landscape currently faces new challenges due to the changes in the workload. Previously, HPC centers were dominated by tightly coupled MPI jobs. HPC workloads increasingly include high-throughput, data-intensive, and stream-processing applications. As a consequence, workloads are becoming more diverse at both application and job levels, posing new challenges to classical HPC schedulers. There is a need to understand the current HPC workloads and their evolution to facilitate informed future scheduling research and enable efficient scheduling in future HPC systems. In this study, we present a methodology to characterize workloads and assess their heterogeneity, at a particular time period and its evolution over time. We apply this methodology to the workloads of three systems (Hopper, Edison, and Carver) at the National Energy Research Scientific Computing Center (NERSC). Finally, we present the resulting characterization of jobs, queues, heterogeneity, and performance that includes detailed information of a year of workload (2014) and evolution through the systems’ lifetime (2010–2014).
- Research Organization:
- Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
- Sponsoring Organization:
- European Union (EU); Swedish Research Council (VR); USDOE; USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
- Grant/Contract Number:
- AC02-05CH11231
- OSTI ID:
- 1439236
- Alternate ID(s):
- OSTI ID: 1463670
OSTI ID: 1495806
- Journal Information:
- Journal of Parallel and Distributed Computing, Journal Name: Journal of Parallel and Distributed Computing Vol. 111; ISSN 0743-7315
- Publisher:
- ElsevierCopyright Statement
- Country of Publication:
- United States
- Language:
- English
Improving Fairness in a Large Scale HTC System Through Workload Analysis and Simulation
|
book | January 2019 |
Similar Records
Towards understanding HPC users and systems: A NERSC case study
Checkpoint/Restart Vision and Strategies for NERSC’s Production Workloads
Parallel Scaling Characteristics of Selected NERSC User ProjectCodes
Journal Article
·
Sun Dec 31 23:00:00 EST 2017
· Journal of Parallel and Distributed Computing
·
OSTI ID:1463670
Checkpoint/Restart Vision and Strategies for NERSC’s Production Workloads
Technical Report
·
Wed Aug 18 00:00:00 EDT 2021
·
OSTI ID:1814161
Parallel Scaling Characteristics of Selected NERSC User ProjectCodes
Technical Report
·
Fri Mar 04 23:00:00 EST 2005
·
OSTI ID:885226