skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Characteristics of workload on ASCI blue-pacific at lawrence livermore national laboratory

Abstract

Symmetric multiprocessor (SMP) clusters have become the prevalent computing platforms for large-scale scientific computation in recent years mainly due to their good scalability. In fact, many parallel machines being used at supercomputing centers and national laboratories are of this type. It is critical and often very difficult on such large-scale parallel computers to efficiently manage a stream of jobs, whose requirement for resources and computing time greatly varies. Understanding the characteristics of workload imposed on a target environment plays a crucial role in managing system resources and developing an efficient resource management scheme. A parallel workload is analyzed typically by studying the traces from actual production parallel machines. The study of the workload traces not only provides the system designers with insight on how to design good processor allocation and job scheduling policies for efficient resource management, but also helps system administrators monitor and fine-tune the resource management strategies and algorithms. Furthermore, the workload traces are a valuable resource for those who conduct performance studies through either simulation or analytical modeling. The workload traces can be directly fed to a trace-driven simulator in a more realistic and specific simulation experiments. Alternatively, one can obtain certain parameters that characterize the workloadmore » by analyzing the traces, and then use them to construct a workload model or to drive a simulation in which a large number of runs are required. Considering these benefits, they collected and analyzed the job traces from ASCI Blue-Pacific, a 336-node IBM SP2 machine at Lawrence Livermore National Laboratory (LLNL). The job traces used span a period of about six months, from October 1999 till the first week of May 2000. The IBM SP2 machine at the LLNL uses gang scheduling LoadLever (GangLL) to manage parallel jobs. User jobs are submitted to the GangLL via a locally developed resource manager called ''Distributed Production Control System'' (DPCS). The DPCS prioritizes jobs based upon a fair-share resource allocation hierarchy and uses a back-fill algorithm to optimize scheduling. The DPCS records all of its activities as well as accounting information for user jobs in a log, from which they collected the job traces. The log can provide quite extensive information on jobs submitted to the DPCS, but they concentrate only on the information pertaining to the service and resource demands of the jobs. In this paper, the authors report on their workload study in three categories. Job submission and execution characteristics, resource requirement analysis, and system utilization analysis. Submission and execution characteristics, resource requirement analysis, and system utilization analysis. Submission and execution characteristics of a job include parameters pertaining to queueing activities in the system such as job submission rate, job what time, and job service time. In resource requirement analysis, the demands for computing nodes and main memory from each job are analyzed. As a part of the resource requirement analysis, they have conducted correlation analysis in an attempt to determine whether there are any correlations between various resource demands and job execution time. Finally, they analyzed how the system is used by groups of jobs exhibiting different resource demands and submission and execution characteristics.« less

Authors:
;
Publication Date:
Research Org.:
Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Sponsoring Org.:
US Department of Energy (US)
OSTI Identifier:
15006159
Report Number(s):
UCRL-JC-140092
TRN: US200405%%297
DOE Contract Number:  
W-7405-ENG-48
Resource Type:
Conference
Resource Relation:
Conference: International Association of Science and Technology for Development International Conference on Applied Informatics, Insbruck (AT), 02/19/2001--02/22/2001; Other Information: PBD: 14 Aug 2000
Country of Publication:
United States
Language:
English
Subject:
99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE; ALGORITHMS; ARRAY PROCESSORS; COMPUTERS; DESIGN; LAWRENCE LIVERMORE NATIONAL LABORATORY; MONITORS; PERFORMANCE; PRODUCTION; RESOURCE MANAGEMENT; SIMULATION; SIMULATORS; TARGETS

Citation Formats

Yoo, A B, and Jette, M A. Characteristics of workload on ASCI blue-pacific at lawrence livermore national laboratory. United States: N. p., 2000. Web. doi:10.1109/CCGRID.2001.923206.
Yoo, A B, & Jette, M A. Characteristics of workload on ASCI blue-pacific at lawrence livermore national laboratory. United States. https://doi.org/10.1109/CCGRID.2001.923206
Yoo, A B, and Jette, M A. 2000. "Characteristics of workload on ASCI blue-pacific at lawrence livermore national laboratory". United States. https://doi.org/10.1109/CCGRID.2001.923206. https://www.osti.gov/servlets/purl/15006159.
@article{osti_15006159,
title = {Characteristics of workload on ASCI blue-pacific at lawrence livermore national laboratory},
author = {Yoo, A B and Jette, M A},
abstractNote = {Symmetric multiprocessor (SMP) clusters have become the prevalent computing platforms for large-scale scientific computation in recent years mainly due to their good scalability. In fact, many parallel machines being used at supercomputing centers and national laboratories are of this type. It is critical and often very difficult on such large-scale parallel computers to efficiently manage a stream of jobs, whose requirement for resources and computing time greatly varies. Understanding the characteristics of workload imposed on a target environment plays a crucial role in managing system resources and developing an efficient resource management scheme. A parallel workload is analyzed typically by studying the traces from actual production parallel machines. The study of the workload traces not only provides the system designers with insight on how to design good processor allocation and job scheduling policies for efficient resource management, but also helps system administrators monitor and fine-tune the resource management strategies and algorithms. Furthermore, the workload traces are a valuable resource for those who conduct performance studies through either simulation or analytical modeling. The workload traces can be directly fed to a trace-driven simulator in a more realistic and specific simulation experiments. Alternatively, one can obtain certain parameters that characterize the workload by analyzing the traces, and then use them to construct a workload model or to drive a simulation in which a large number of runs are required. Considering these benefits, they collected and analyzed the job traces from ASCI Blue-Pacific, a 336-node IBM SP2 machine at Lawrence Livermore National Laboratory (LLNL). The job traces used span a period of about six months, from October 1999 till the first week of May 2000. The IBM SP2 machine at the LLNL uses gang scheduling LoadLever (GangLL) to manage parallel jobs. User jobs are submitted to the GangLL via a locally developed resource manager called ''Distributed Production Control System'' (DPCS). The DPCS prioritizes jobs based upon a fair-share resource allocation hierarchy and uses a back-fill algorithm to optimize scheduling. The DPCS records all of its activities as well as accounting information for user jobs in a log, from which they collected the job traces. The log can provide quite extensive information on jobs submitted to the DPCS, but they concentrate only on the information pertaining to the service and resource demands of the jobs. In this paper, the authors report on their workload study in three categories. Job submission and execution characteristics, resource requirement analysis, and system utilization analysis. Submission and execution characteristics, resource requirement analysis, and system utilization analysis. Submission and execution characteristics of a job include parameters pertaining to queueing activities in the system such as job submission rate, job what time, and job service time. In resource requirement analysis, the demands for computing nodes and main memory from each job are analyzed. As a part of the resource requirement analysis, they have conducted correlation analysis in an attempt to determine whether there are any correlations between various resource demands and job execution time. Finally, they analyzed how the system is used by groups of jobs exhibiting different resource demands and submission and execution characteristics.},
doi = {10.1109/CCGRID.2001.923206},
url = {https://www.osti.gov/biblio/15006159}, journal = {},
number = ,
volume = ,
place = {United States},
year = {Mon Aug 14 00:00:00 EDT 2000},
month = {Mon Aug 14 00:00:00 EDT 2000}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: