skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Swift : fast, reliable, loosely coupled parallel computation.

Abstract

A common pattern in scientific computing involves the execution of many tasks that are coupled only in the sense that the output of one may be passed as input to one or more others - for example, as a file, or via a Web Services invocation. While such 'loosely coupled' computations can involve large amounts of computation and communication, the concerns of the programmer tend to be different than in traditional high performance computing, being focused on management issues relating to the large numbers of datasets and tasks (and often, the complexities inherent in 'messy' data organizations) rather than the optimization of interprocessor communication. To address these concerns, we have developed Swift, a system that combines a novel scripting language called SwiftScript with a powerful runtime system based on CoG Karajan and Falkon to allow for the concise specification, and reliable and efficient execution, of large loosely coupled computations. Swift adopts and adapts ideas first explored in the GriPhyN virtual data system, improving on that system in many regards. We describe the SwiftScript language and its use of XDTM to describe the logical structure of complex file system structures. We also present the Swift system and its use of CoGmore » Karajan, Falkon, and Globus services to dispatch and manage the execution of many tasks in different execution environments. We summarize application experiences and detail performance experiments that quantify the cost of Swift operations.« less

Authors:
; ; ; ; ; ; ; ; ; ;
Publication Date:
Research Org.:
Argonne National Lab. (ANL), Argonne, IL (United States)
Sponsoring Org.:
USDOE Office of Science (SC); National Science Foundation (NSF); National Institutes of Health (NIH)
OSTI Identifier:
971148
Report Number(s):
ANL/MCS/CP-59297
TRN: US201003%%596
DOE Contract Number:
DE-AC02-06CH11357
Resource Type:
Conference
Resource Relation:
Conference: IEEE 2007 International Conference on Web Services (ICWS); Jul. 7, 2007 - Jul. 13, 2007; Salt Lake City, UT
Country of Publication:
United States
Language:
ENGLISH
Subject:
99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE; MANAGEMENT; OPTIMIZATION; PERFORMANCE; PROGRAMMING

Citation Formats

Zhao, Y., Hategan, M., Clifford, B., Foster, I., von Laszewski, G., Nefedova, V., Raicu, I., Stef-Praun, T., Wilde, M., Mathematics and Computer Science, and Univ. of Chicago. Swift : fast, reliable, loosely coupled parallel computation.. United States: N. p., 2007. Web.
Zhao, Y., Hategan, M., Clifford, B., Foster, I., von Laszewski, G., Nefedova, V., Raicu, I., Stef-Praun, T., Wilde, M., Mathematics and Computer Science, & Univ. of Chicago. Swift : fast, reliable, loosely coupled parallel computation.. United States.
Zhao, Y., Hategan, M., Clifford, B., Foster, I., von Laszewski, G., Nefedova, V., Raicu, I., Stef-Praun, T., Wilde, M., Mathematics and Computer Science, and Univ. of Chicago. Mon . "Swift : fast, reliable, loosely coupled parallel computation.". United States. doi:.
@article{osti_971148,
title = {Swift : fast, reliable, loosely coupled parallel computation.},
author = {Zhao, Y. and Hategan, M. and Clifford, B. and Foster, I. and von Laszewski, G. and Nefedova, V. and Raicu, I. and Stef-Praun, T. and Wilde, M. and Mathematics and Computer Science and Univ. of Chicago},
abstractNote = {A common pattern in scientific computing involves the execution of many tasks that are coupled only in the sense that the output of one may be passed as input to one or more others - for example, as a file, or via a Web Services invocation. While such 'loosely coupled' computations can involve large amounts of computation and communication, the concerns of the programmer tend to be different than in traditional high performance computing, being focused on management issues relating to the large numbers of datasets and tasks (and often, the complexities inherent in 'messy' data organizations) rather than the optimization of interprocessor communication. To address these concerns, we have developed Swift, a system that combines a novel scripting language called SwiftScript with a powerful runtime system based on CoG Karajan and Falkon to allow for the concise specification, and reliable and efficient execution, of large loosely coupled computations. Swift adopts and adapts ideas first explored in the GriPhyN virtual data system, improving on that system in many regards. We describe the SwiftScript language and its use of XDTM to describe the logical structure of complex file system structures. We also present the Swift system and its use of CoG Karajan, Falkon, and Globus services to dispatch and manage the execution of many tasks in different execution environments. We summarize application experiences and detail performance experiments that quantify the cost of Swift operations.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Mon Jan 01 00:00:00 EST 2007},
month = {Mon Jan 01 00:00:00 EST 2007}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share:
  • In this paper the implementation of a parallel O(LogN) algorithm for computation of rigid multibody dynamics on a Hypercube MIMD parallel architecture is presented. To our knowledge, this is the first algorithm that achieves the time lower bound of O(LogN) by using an optimal number of O(N) processors. However, in addition to its theoretical significance, the algorithm is also highly efficient for practical implementation on commercially available MIMD parallel architectures due to its highly coarse grain size and simple communication and synchronization requirements. We present a multilevel parallel computation strategy for implementation of the algorithm on a Hypercube. This strategymore » allows the exploitation of parallelism at several computational levels as well as maximum overlapping of computation and communication to increase the performance of parallel computation. 24 refs.« less
  • In pursuit of a general fault-tolerance scheme application to many loosely coupled networks (LCNs), this dissertation investigated software techniques for practical and cost-effective implementation of such a scheme with emphasis on decentralized computation recovery. As a result, a scheme, called the PTC/LCN scheme was developed that allows independent and uncoordinated design of error detection and recovery capabilities of distributed processes. Operational principles of the PTC/LCN were devised to support the system design philosophy under which a process is allowed to exchange information which has not been completely validated while each process is solely responsible for detecting and correcting errors thatmore » it originated. A graph-theoretic model was exploited to analyze and validate performance-related properties of the PTC/LCN scheme. In order to demonstrate the practicality of the PTC/LCN scheme, the scheme has been realized in a general architectural model of fault-tolerant LCNs. The focus of this architectural model is placed on design of an efficient and reliable communication protocol that allows exchange of pre-committed messages and is capable of recalling them when invalidated.« less
  • Distributed-memory architectures offer high levels of performance and flexibility, but have proven awkward to program. Current languages for nonshared memory architectures provide a relatively low-level programming environment, and are poorly suited to modular programming, and to the construction of libraries. This paper describes a set of language primitives designed to allow the specification of parallel numerical algorithms at a higher level. Focus here is on tensor-product-array computations, a simple but important class of numerical algorithms. The authors consider first the problem of programming one-dimensional kernel routines, such as parallel tridiagonal solvers, and after that look at how such parallel kernelsmore » can be combined to form parallel tensor-product algorithms.« less
  • This patent describes a network simulator for simulating a plurality of parallel processing networks. It comprises: a buses for transmitting information segments to processing sites; each of the busses includes data line means to transmit an information segment, at least one control line means to transmit control information related to the information segment; and a reply line means for indicating that another processing site is coupled to the bus to receive the information segment; sets of processing sites each set being coupled to a given one of the plurality of buses, each processing site having a processor means and interfacemore » means coupling the processor to the bus; clock means coupled to each of the interface means to synchronize time intervals during which the respective interface means couples it corresponding processor means to its corresponding bus; time multiplex switching means coupled to each of the buses to receive an information segment from one of the busses for transmission on another of the buses; and each of the interface means including sequencing means coupled to the clock means to select during which time interval the corresponding processor means is to be coupled to its respective bus.« less