Characteristics of workload on ASCI blue-pacific at lawrence livermore national laboratory
Symmetric multiprocessor (SMP) clusters have become the prevalent computing platforms for large-scale scientific computation in recent years mainly due to their good scalability. In fact, many parallel machines being used at supercomputing centers and national laboratories are of this type. It is critical and often very difficult on such large-scale parallel computers to efficiently manage a stream of jobs, whose requirement for resources and computing time greatly varies. Understanding the characteristics of workload imposed on a target environment plays a crucial role in managing system resources and developing an efficient resource management scheme. A parallel workload is analyzed typically by studying the traces from actual production parallel machines. The study of the workload traces not only provides the system designers with insight on how to design good processor allocation and job scheduling policies for efficient resource management, but also helps system administrators monitor and fine-tune the resource management strategies and algorithms. Furthermore, the workload traces are a valuable resource for those who conduct performance studies through either simulation or analytical modeling. The workload traces can be directly fed to a trace-driven simulator in a more realistic and specific simulation experiments. Alternatively, one can obtain certain parameters that characterize the workload by analyzing the traces, and then use them to construct a workload model or to drive a simulation in which a large number of runs are required. Considering these benefits, they collected and analyzed the job traces from ASCI Blue-Pacific, a 336-node IBM SP2 machine at Lawrence Livermore National Laboratory (LLNL). The job traces used span a period of about six months, from October 1999 till the first week of May 2000. The IBM SP2 machine at the LLNL uses gang scheduling LoadLever (GangLL) to manage parallel jobs. User jobs are submitted to the GangLL via a locally developed resource manager called ''Distributed Production Control System'' (DPCS). The DPCS prioritizes jobs based upon a fair-share resource allocation hierarchy and uses a back-fill algorithm to optimize scheduling. The DPCS records all of its activities as well as accounting information for user jobs in a log, from which they collected the job traces. The log can provide quite extensive information on jobs submitted to the DPCS, but they concentrate only on the information pertaining to the service and resource demands of the jobs. In this paper, the authors report on their workload study in three categories. Job submission and execution characteristics, resource requirement analysis, and system utilization analysis. Submission and execution characteristics, resource requirement analysis, and system utilization analysis. Submission and execution characteristics of a job include parameters pertaining to queueing activities in the system such as job submission rate, job what time, and job service time. In resource requirement analysis, the demands for computing nodes and main memory from each job are analyzed. As a part of the resource requirement analysis, they have conducted correlation analysis in an attempt to determine whether there are any correlations between various resource demands and job execution time. Finally, they analyzed how the system is used by groups of jobs exhibiting different resource demands and submission and execution characteristics.
- Research Organization:
- Lawrence Livermore National Lab., CA (US)
- Sponsoring Organization:
- US Department of Energy (US)
- DOE Contract Number:
- W-7405-ENG-48
- OSTI ID:
- 15006159
- Report Number(s):
- UCRL-JC-140092
- Country of Publication:
- United States
- Language:
- English
Similar Records
Quantum Computing in the Cloud: Analyzing job and machine characteristics
Adaptive job and resource management for the growing quantum cloud
Evaluating HPC Scheduling Strategies for Urgent Workloads
Conference
·
Wed Jan 12 23:00:00 EST 2022
· 2021 IEEE International Symposium on Workload Characterization (IISWC)
·
OSTI ID:1865679
Adaptive job and resource management for the growing quantum cloud
Conference
·
Thu Nov 18 23:00:00 EST 2021
·
OSTI ID:1865307
Evaluating HPC Scheduling Strategies for Urgent Workloads
Conference
·
Fri Oct 31 20:00:00 EDT 2025
·
OSTI ID:3019946