A Generic Scheduling Simulator for High Performance Parallel Computers

Yoo, B S; Choi, G S; Jette, M A

A Generic Scheduling Simulator for High Performance Parallel Computers

Conference · Wed Aug 01 00:00:00 EDT 2001

OSTI ID:15006309

Yoo, B S; Choi, G S; Jette, M A

It is well known that efficient job scheduling plays a crucial role in achieving high system utilization in large-scale high performance computing environments. A good scheduling algorithm should schedule jobs to achieve high system utilization while satisfying various user demands in an equitable fashion. Designing such a scheduling algorithm is a non-trivial task even in a static environment. In practice, the computing environment and workload are constantly changing. There are several reasons for this. First, the computing platforms constantly evolve as the technology advances. For example, the availability of relatively powerful commodity off-the-shelf (COTS) components at steadily diminishing prices have made it feasible to construct ever larger massively parallel computers in recent years [1, 4]. Second, the workload imposed on the system also changes constantly. The rapidly increasing compute resources have provided many applications developers with the opportunity to radically alter program characteristics and take advantage of these additional resources. New developments in software technology may also trigger changes in user applications. Finally, political climate change may alter user priorities or the mission of the organization. System designers in such dynamic environments must be able to accurately forecast the effect of changes in the hardware, software, and/or policies under consideration. If the environmental changes are significant, one must also reassess scheduling algorithms. Simulation has frequently been relied upon for this analysis, because other methods such as analytical modeling or actual measurements are usually too difficult or costly. A drawback of the simulation approach, however, is that developing a simulator is a time-consuming process. Furthermore, an existing simulator cannot be easily adapted to a new environment. In this research, we attempt to develop a generic job-scheduling simulator, which facilitates the evaluation of different scheduling algorithms in various computing environments. The following are our design objectives for this generic simulator. (1) Accept descriptions of varied workloads for a wide range of computing environments. (2) Provide an easy-to-use interface for description of the scheduling policies being evaluated. (3) Accurately calculate the overhead induced by various scheduling algorithms. (4) Accurately model a variety of machine architectures. In summary, we have developed a generic scheduling simulator for high performance parallel computers. This generic simulator supports standard and user-defined job attributes and generates the job attribute values from different input sources, allowing users to model a wide range of workloads, and produces performance parameters with reliability measures. All overheads caused by scheduling algorithms are considered in measuring the performance parameters. The simulator simulates a queuing network to which users can bound a specific scheduling algorithm written as a C function. A set of APIs is provided for the users to facilitate describing the scheduling algorithms. With these features, this simulator can accurately simulate any scheduling algorithms under various workloads and computing platforms. The simulator does not currently model dynamic events like message passing between tasks closely, but we plan to include this crucial functionality into our simulator in the future.

Research Organization:: Lawrence Livermore National Lab., CA (US)

Sponsoring Organization:: US Department of Energy (US)

DOE Contract Number:: W-7405-ENG-48

OSTI ID:: 15006309

Report Number(s):: UCRL-JC-144818

Country of Publication:: United States

Language:: English

Similar Records

DRAS: Deep Reinforcement Learning for Cluster Scheduling in High Performance Computing

Journal Article · Fri Sep 16 00:00:00 EDT 2022 · IEEE Transactions on Parallel and Distributed Systems · OSTI ID:1984484

RLScheduler: An Automated HPC Batch Job Scheduler Using Reinforcement Learning

Conference · Sun Nov 01 00:00:00 EDT 2020 · OSTI ID:1777791

Characteristics of workload on ASCI blue-pacific at lawrence livermore national laboratory

Conference · Mon Aug 14 00:00:00 EDT 2000 · OSTI ID:15006159

Related Subjects

99 GENERAL AND MISCELLANEOUS
ALGORITHMS
AVAILABILITY
CLIMATES
COMPUTERS
DESIGN
EVALUATION
LOS ALAMOS
PERFORMANCE
PRICES
RELIABILITY
SCHEDULES
SIMULATION
SIMULATORS

A Generic Scheduling Simulator for High Performance Parallel Computers

Citation Formats

Similar Records

Related Subjects