skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Characteristics of workload on ASCI blue-pacific at lawrence livermore national laboratory

Conference ·

Symmetric multiprocessor (SMP) clusters have become the prevalent computing platforms for large-scale scientific computation in recent years mainly due to their good scalability. In fact, many parallel machines being used at supercomputing centers and national laboratories are of this type. It is critical and often very difficult on such large-scale parallel computers to efficiently manage a stream of jobs, whose requirement for resources and computing time greatly varies. Understanding the characteristics of workload imposed on a target environment plays a crucial role in managing system resources and developing an efficient resource management scheme. A parallel workload is analyzed typically by studying the traces from actual production parallel machines. The study of the workload traces not only provides the system designers with insight on how to design good processor allocation and job scheduling policies for efficient resource management, but also helps system administrators monitor and fine-tune the resource management strategies and algorithms. Furthermore, the workload traces are a valuable resource for those who conduct performance studies through either simulation or analytical modeling. The workload traces can be directly fed to a trace-driven simulator in a more realistic and specific simulation experiments. Alternatively, one can obtain certain parameters that characterize the workload by analyzing the traces, and then use them to construct a workload model or to drive a simulation in which a large number of runs are required. Considering these benefits, they collected and analyzed the job traces from ASCI Blue-Pacific, a 336-node IBM SP2 machine at Lawrence Livermore National Laboratory (LLNL). The job traces used span a period of about six months, from October 1999 till the first week of May 2000. The IBM SP2 machine at the LLNL uses gang scheduling LoadLever (GangLL) to manage parallel jobs. User jobs are submitted to the GangLL via a locally developed resource manager called ''Distributed Production Control System'' (DPCS). The DPCS prioritizes jobs based upon a fair-share resource allocation hierarchy and uses a back-fill algorithm to optimize scheduling. The DPCS records all of its activities as well as accounting information for user jobs in a log, from which they collected the job traces. The log can provide quite extensive information on jobs submitted to the DPCS, but they concentrate only on the information pertaining to the service and resource demands of the jobs. In this paper, the authors report on their workload study in three categories. Job submission and execution characteristics, resource requirement analysis, and system utilization analysis. Submission and execution characteristics, resource requirement analysis, and system utilization analysis. Submission and execution characteristics of a job include parameters pertaining to queueing activities in the system such as job submission rate, job what time, and job service time. In resource requirement analysis, the demands for computing nodes and main memory from each job are analyzed. As a part of the resource requirement analysis, they have conducted correlation analysis in an attempt to determine whether there are any correlations between various resource demands and job execution time. Finally, they analyzed how the system is used by groups of jobs exhibiting different resource demands and submission and execution characteristics.

Research Organization:
Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Sponsoring Organization:
US Department of Energy (US)
DOE Contract Number:
W-7405-ENG-48
OSTI ID:
15006159
Report Number(s):
UCRL-JC-140092; TRN: US200405%%297
Resource Relation:
Conference: International Association of Science and Technology for Development International Conference on Applied Informatics, Insbruck (AT), 02/19/2001--02/22/2001; Other Information: PBD: 14 Aug 2000
Country of Publication:
United States
Language:
English