skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Efficient Subtorus Processor Allocation in a Multi-Dimensional Torus

Abstract

Processor allocation in a mesh or torus connected multicomputer system with up to three dimensions is a hard problem that has received some research attention in the past decade. With the recent deployment of multicomputer systems with a torus topology of dimensions higher than three, which are used to solve complex problems arising in scientific computing, it becomes imminent to study the problem of allocating processors of the configuration of a torus in a multi-dimensional torus connected system. In this paper, we first define the concept of a semitorus. We present two partition schemes, the Equal Partition (EP) and the Non-Equal Partition (NEP), that partition a multi-dimensional semitorus into a set of sub-semitori. We then propose two processor allocation algorithms based on these partition schemes. We evaluate our algorithms by incorporating them in commonly used FCFS and backfilling scheduling policies and conducting simulation using workload traces from the Parallel Workloads Archive. Specifically, our simulation experiments compare four algorithm combinations, FCFS/EP, FCFS/NEP, backfilling/EP, and backfilling/NEP, for two existing multi-dimensional torus connected systems. The simulation results show that our algorithms (especially the backfilling/NEP combination) are capable of producing schedules with system utilization and mean job bounded slowdowns comparable to those in amore » fully connected multicomputer.« less

Authors:
; ;
Publication Date:
Research Org.:
Thomas Jefferson National Accelerator Facility, Newport News, VA
Sponsoring Org.:
USDOE - Office of Energy Research (ER)
OSTI Identifier:
887122
Report Number(s):
JLAB-07-05-467; DOE/ER/40150-3984
TRN: US200617%%505
DOE Contract Number:
AC05-84ER40150
Resource Type:
Conference
Resource Relation:
Conference: 8th International Conference On High Performance Computing In Asia Pacific Region (HPC Asia 2005), 30 Nov - 3 Dec 2005, Beijing, China
Country of Publication:
United States
Language:
English
Subject:
99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE; ALGORITHMS; PERFORMANCE; COMPUTER NETWORKS; COMPUTER ARCHITECTURE; MEMORY MANAGEMENT; MANY-DIMENSIONAL CALCULATIONS

Citation Formats

Weizhen Mao, Jie Chen, and William Watson. Efficient Subtorus Processor Allocation in a Multi-Dimensional Torus. United States: N. p., 2005. Web.
Weizhen Mao, Jie Chen, & William Watson. Efficient Subtorus Processor Allocation in a Multi-Dimensional Torus. United States.
Weizhen Mao, Jie Chen, and William Watson. Wed . "Efficient Subtorus Processor Allocation in a Multi-Dimensional Torus". United States. doi:. https://www.osti.gov/servlets/purl/887122.
@article{osti_887122,
title = {Efficient Subtorus Processor Allocation in a Multi-Dimensional Torus},
author = {Weizhen Mao and Jie Chen and William Watson},
abstractNote = {Processor allocation in a mesh or torus connected multicomputer system with up to three dimensions is a hard problem that has received some research attention in the past decade. With the recent deployment of multicomputer systems with a torus topology of dimensions higher than three, which are used to solve complex problems arising in scientific computing, it becomes imminent to study the problem of allocating processors of the configuration of a torus in a multi-dimensional torus connected system. In this paper, we first define the concept of a semitorus. We present two partition schemes, the Equal Partition (EP) and the Non-Equal Partition (NEP), that partition a multi-dimensional semitorus into a set of sub-semitori. We then propose two processor allocation algorithms based on these partition schemes. We evaluate our algorithms by incorporating them in commonly used FCFS and backfilling scheduling policies and conducting simulation using workload traces from the Parallel Workloads Archive. Specifically, our simulation experiments compare four algorithm combinations, FCFS/EP, FCFS/NEP, backfilling/EP, and backfilling/NEP, for two existing multi-dimensional torus connected systems. The simulation results show that our algorithms (especially the backfilling/NEP combination) are capable of producing schedules with system utilization and mean job bounded slowdowns comparable to those in a fully connected multicomputer.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Wed Nov 30 00:00:00 EST 2005},
month = {Wed Nov 30 00:00:00 EST 2005}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share:
  • Abstract not provided.
  • The Computational Plant, or Cplant is a commodity-based supercomputer under development at Sandia National Laboratories. This paper describes resource-allocation strategies to achieve processor locality for parallel jobs in Cplant and other supercomputers. Users of Cplant and other Sandia supercomputers submit parallel jobs to a job queue. When a job is scheduled to run, it is assigned to a set of processors. To obtain maximum throughput, jobs should be allocated to localized clusters of processors to minimize communication costs and to avoid bandwidth contention caused by overlapping jobs. This paper introduces new allocation strategies and performance metrics based on space-tilling curvesmore » and one dimensional allocation strategies. These algorithms are general and simple. Preliminary simulations and Cplant experiments indicate that both space-filling curves and one-dimensional packing improve processor locality compared to the sorted free list strategy previously used on Cplant. These new allocation strategies are implemented in the new release of the Cplant System Software, Version 2.0, phased into the Cplant systems at Sandia by May 2002.« less
  • An efficient numerical scheme has been developed for the solution of the finite-differenced pressure linked fluid flow equations. The algorithm solves the set of nonlinear simultaneous equations by a combination of Newton's method and efficient sparse matrix techniques. In tests on typical recirculating flows the method is rapidly convergent. The method does not require any under-relaxation or other convergence-enhancing techniques employed in other solution schemes. It is currently described for two-dimensional steady state flows but is extendible to three dimensions and mildly time-varying flows. The method is robust to changes in Reynolds number, grid aspect ratio, and mesh size. Thismore » paper reports the algorithm and the results of calculations performed.« less
  • This paper investigates the effectiveness of fine-grain decomposition in the context of the prototype dataflow machine now in operation at the University of Manchester. The current machine is a uniprocessor, known as the Single-Ring Dataflow Machine, comprising a single processing element which contains several units connected together in a pipelined ring. A Multi-ring Dataflow Machine (MDM) containing several such processing elements connected together via an interprocessor switching network, is currently under investigation. This paper describes a method of allocating dataflow instructions to processing elements in the MDM, and examines the influence of this method on selection of a switching network.more » Results obtained from simulation of the MDM are presented. They show that programs are executed efficiently when their parallelism is matched to the parallelism of the machine hardware.« less