Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Towards understanding HPC users and systems: A NERSC case study

Journal Article · · Journal of Parallel and Distributed Computing
 [1];  [2];  [2];  [3];  [3];  [3]
  1. Umeå University, Umeå (Sweden). Department Computing Science; Lawrence Berkeley National Lab Berkeley. (LBNL), CA (United States).
  2. Umeå University, Umeå (Sweden). Department Computing Science
  3. Lawrence Berkeley National Lab Berkeley. (LBNL), CA (United States).
High performance computing (HPC) scheduling landscape currently faces new challenges due to thechanges in the workload. Previously, HPC centers were dominated by tightly coupled MPI jobs. HPCworkloads increasingly include high-throughput, data-intensive, and stream-processing applications. Asa consequence, workloads are becoming more diverse at both application and job levels, posing newchallenges to classical HPC schedulers. There is a need to understand the current HPC workloads andtheir evolution to facilitate informed future scheduling research and enable efficient scheduling in futureHPC systems.In this paper, we present a methodology to characterize workloads and assess their heterogeneity,at a particular time period and its evolution over time. We apply this methodology to the workloads ofthree systems (Hopper, Edison, and Carver) at the National Energy Research Scientific Computing Center(NERSC). We present the resulting characterization of jobs, queues, heterogeneity, and performance thatincludes detailed information of a year of workload (2014) and evolution through the systems’ lifetime(2010–2014).
Research Organization:
Lawrence Berkeley National Laboratory, Berkeley, CA (United States). National Energy Research Scientific Computing Center (NERSC).
Sponsoring Organization:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)
DOE Contract Number:
AC02-05CH11231
OSTI ID:
1463670
Journal Information:
Journal of Parallel and Distributed Computing, Journal Name: Journal of Parallel and Distributed Computing Journal Issue: C Vol. 111; ISSN 0743-7315
Country of Publication:
United States
Language:
English

References (8)

The Grid Workloads Archive journal July 2008
Algorithm AS 136: A K-Means Clustering Algorithm journal January 1979
The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis
  • Huang, Norden E.; Shen, Zheng; Long, Steven R.
  • Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences, Vol. 454, Issue 1971 https://doi.org/10.1098/rspa.1998.0193
journal March 1998
Adapting scientific computing problems to clouds using MapReduce journal January 2012
Towards characterizing cloud backend workloads: insights from Google compute clusters journal March 2010
Matplotlib: A 2D Graphics Environment journal January 2007
Backfilling Using System-Generated Predictions Rather than User Runtime Estimates journal June 2007
The workload on parallel supercomputers: modeling the characteristics of rigid jobs journal November 2003

Cited By (1)

Improving Fairness in a Large Scale HTC System Through Workload Analysis and Simulation book January 2019

Similar Records

Towards understanding HPC users and systems: A NERSC case study
Journal Article · Thu Sep 14 00:00:00 EDT 2017 · Journal of Parallel and Distributed Computing · OSTI ID:1439236

Checkpoint/Restart Vision and Strategies for NERSC’s Production Workloads
Technical Report · Wed Aug 18 00:00:00 EDT 2021 · OSTI ID:1814161

Parallel Scaling Characteristics of Selected NERSC User ProjectCodes
Technical Report · Fri Mar 04 23:00:00 EST 2005 · OSTI ID:885226

Related Subjects