skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Towards understanding HPC users and systems: A NERSC case study

Abstract

High performance computing (HPC) scheduling landscape currently faces new challenges due to thechanges in the workload. Previously, HPC centers were dominated by tightly coupled MPI jobs. HPCworkloads increasingly include high-throughput, data-intensive, and stream-processing applications. Asa consequence, workloads are becoming more diverse at both application and job levels, posing newchallenges to classical HPC schedulers. There is a need to understand the current HPC workloads andtheir evolution to facilitate informed future scheduling research and enable efficient scheduling in futureHPC systems.In this paper, we present a methodology to characterize workloads and assess their heterogeneity,at a particular time period and its evolution over time. We apply this methodology to the workloads ofthree systems (Hopper, Edison, and Carver) at the National Energy Research Scientific Computing Center(NERSC). We present the resulting characterization of jobs, queues, heterogeneity, and performance thatincludes detailed information of a year of workload (2014) and evolution through the systems’ lifetime(2010–2014).

Authors:
 [1];  [2];  [2];  [3]; ORCiD logo [3];  [3]
  1. Umeå University, Umeå (Sweden). Department Computing Science; Lawrence Berkeley National Lab Berkeley. (LBNL), CA (United States).
  2. Umeå University, Umeå (Sweden). Department Computing Science
  3. Lawrence Berkeley National Lab Berkeley. (LBNL), CA (United States).
Publication Date:
Research Org.:
Lawrence Berkeley National Laboratory, Berkeley, CA (United States). National Energy Research Scientific Computing Center (NERSC).
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)
OSTI Identifier:
1463670
DOE Contract Number:  
AC02-05CH11231
Resource Type:
Journal Article
Journal Name:
Journal of Parallel and Distributed Computing
Additional Journal Information:
Journal Volume: 111; Journal Issue: C; Journal ID: ISSN 0743-7315
Country of Publication:
United States
Language:
English

Citation Formats

Rodrigo, Gonzalo P., �stberg, P. -O., Elmroth, Erik, Antypas, Katie, Gerber, Richard, and Ramakrishnan, Lavanya. Towards understanding HPC users and systems: A NERSC case study. United States: N. p., 2018. Web. doi:10.1016/j.jpdc.2017.09.002.
Rodrigo, Gonzalo P., �stberg, P. -O., Elmroth, Erik, Antypas, Katie, Gerber, Richard, & Ramakrishnan, Lavanya. Towards understanding HPC users and systems: A NERSC case study. United States. doi:10.1016/j.jpdc.2017.09.002.
Rodrigo, Gonzalo P., �stberg, P. -O., Elmroth, Erik, Antypas, Katie, Gerber, Richard, and Ramakrishnan, Lavanya. Mon . "Towards understanding HPC users and systems: A NERSC case study". United States. doi:10.1016/j.jpdc.2017.09.002.
@article{osti_1463670,
title = {Towards understanding HPC users and systems: A NERSC case study},
author = {Rodrigo, Gonzalo P. and �stberg, P. -O. and Elmroth, Erik and Antypas, Katie and Gerber, Richard and Ramakrishnan, Lavanya},
abstractNote = {High performance computing (HPC) scheduling landscape currently faces new challenges due to thechanges in the workload. Previously, HPC centers were dominated by tightly coupled MPI jobs. HPCworkloads increasingly include high-throughput, data-intensive, and stream-processing applications. Asa consequence, workloads are becoming more diverse at both application and job levels, posing newchallenges to classical HPC schedulers. There is a need to understand the current HPC workloads andtheir evolution to facilitate informed future scheduling research and enable efficient scheduling in futureHPC systems.In this paper, we present a methodology to characterize workloads and assess their heterogeneity,at a particular time period and its evolution over time. We apply this methodology to the workloads ofthree systems (Hopper, Edison, and Carver) at the National Energy Research Scientific Computing Center(NERSC). We present the resulting characterization of jobs, queues, heterogeneity, and performance thatincludes detailed information of a year of workload (2014) and evolution through the systems’ lifetime(2010–2014).},
doi = {10.1016/j.jpdc.2017.09.002},
journal = {Journal of Parallel and Distributed Computing},
issn = {0743-7315},
number = C,
volume = 111,
place = {United States},
year = {2018},
month = {1}
}