Towards understanding HPC users and systems: A NERSC case study

Rodrigo, Gonzalo P.; Ostberg, P. -O.; Elmroth, Erik; Antypas, Katie; Gerber, Richard; Ramakrishnan, Lavanya

doi:10.1016/j.jpdc.2017.09.002

Towards understanding HPC users and systems: A NERSC case study

Journal Article · Thu Sep 14 00:00:00 EDT 2017 · Journal of Parallel and Distributed Computing

DOI:https://doi.org/10.1016/j.jpdc.2017.09.002· OSTI ID:1439236

Rodrigo, Gonzalo P. ^[1]; Ostberg, P. -O. ^[2]; Elmroth, Erik ^[2]; Antypas, Katie ^[3]; ^[3]; Ramakrishnan, Lavanya ^[3]

Umeå University (Sweden); Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Umeå University (Sweden)
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)

High performance computing (HPC) scheduling landscape currently faces new challenges due to the changes in the workload. Previously, HPC centers were dominated by tightly coupled MPI jobs. HPC workloads increasingly include high-throughput, data-intensive, and stream-processing applications. As a consequence, workloads are becoming more diverse at both application and job levels, posing new challenges to classical HPC schedulers. There is a need to understand the current HPC workloads and their evolution to facilitate informed future scheduling research and enable efficient scheduling in future HPC systems. In this study, we present a methodology to characterize workloads and assess their heterogeneity, at a particular time period and its evolution over time. We apply this methodology to the workloads of three systems (Hopper, Edison, and Carver) at the National Energy Research Scientific Computing Center (NERSC). Finally, we present the resulting characterization of jobs, queues, heterogeneity, and performance that includes detailed information of a year of workload (2014) and evolution through the systems’ lifetime (2010–2014).

Research Organization:: Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)

Sponsoring Organization:: European Union (EU); Swedish Research Council (VR); USDOE; USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)

Grant/Contract Number:: AC02-05CH11231

OSTI ID:: 1439236

Alternate ID(s):: OSTI ID: 1463670
OSTI ID: 1495806

Journal Information:: Journal of Parallel and Distributed Computing, Journal Name: Journal of Parallel and Distributed Computing Vol. 111; ISSN 0743-7315

Publisher:: ElsevierCopyright Statement

Country of Publication:: United States

Language:: English

References (8)

The workload on parallel supercomputers: modeling the characteristics of rigid jobs Lublin, Uri; Feitelson, Dror G. Journal of Parallel and Distributed Computing, Vol. 63, Issue 11 https://doi.org/10.1016/S0743-7315(03)00108-4	journal	November 2003
The Grid Workloads Archive Iosup, Alexandru; Li, Hui; Jan, Mathieu Future Generation Computer Systems, Vol. 24, Issue 7 https://doi.org/10.1016/j.future.2008.02.003	journal	July 2008
Adapting scientific computing problems to clouds using MapReduce Srirama, Satish Narayana; Jakovits, Pelle; Vainikko, Eero Future Generation Computer Systems, Vol. 28, Issue 1 https://doi.org/10.1016/j.future.2011.05.025	journal	January 2012
The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis Huang, Norden E.; Shen, Zheng; Long, Steven R. Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences, Vol. 454, Issue 1971 https://doi.org/10.1098/rspa.1998.0193	journal	March 1998
Matplotlib: A 2D Graphics Environment Hunter, John D. Computing in Science & Engineering, Vol. 9, Issue 3 https://doi.org/10.1109/MCSE.2007.55	journal	January 2007
Backfilling Using System-Generated Predictions Rather than User Runtime Estimates Tsafrir, Dan; Etsion, Yoav; Feitelson, Dror G. IEEE Transactions on Parallel and Distributed Systems, Vol. 18, Issue 6 https://doi.org/10.1109/TPDS.2007.70606	journal	June 2007
Towards characterizing cloud backend workloads: insights from Google compute clusters Mishra, Asit K.; Hellerstein, Joseph L.; Cirne, Walfredo ACM SIGMETRICS Performance Evaluation Review, Vol. 37, Issue 4 https://doi.org/10.1145/1773394.1773400	journal	March 2010
Algorithm AS 136: A K-Means Clustering Algorithm Hartigan, J. A.; Wong, M. A. Applied Statistics, Vol. 28, Issue 1 https://doi.org/10.2307/2346830	journal	January 1979

Cited By (1)

Improving Fairness in a Large Scale HTC System Through Workload Analysis and Simulation Azevedo, Frédéric; Klusáček, Dalibor; Suter, Frédéric Lecture Notes in Computer Science https://doi.org/10.1007/978-3-030-29400-7_10	book	January 2019

Similar Records

Towards understanding HPC users and systems: A NERSC case study

Journal Article · Sun Dec 31 23:00:00 EST 2017 · Journal of Parallel and Distributed Computing · OSTI ID:1463670

Checkpoint/Restart Vision and Strategies for NERSC’s Production Workloads

Technical Report · Wed Aug 18 00:00:00 EDT 2021 · OSTI ID:1814161

Parallel Scaling Characteristics of Selected NERSC User ProjectCodes

Technical Report · Fri Mar 04 23:00:00 EST 2005 · OSTI ID:885226

Related Subjects

97 MATHEMATICS AND COMPUTING
HPC
NERSC
heterogeneity
k-means
scheduling
supercomputer
workload analysis

Towards understanding HPC users and systems: A NERSC case study

Citation Formats

References (8)

Cited By (1)

Similar Records

Related Subjects