skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Cooperative fault-tolerant distributed computing U.S. Department of Energy Grant DE-FG02-02ER25537 Final Report

Abstract

The Harness project has developed novel software frameworks for the execution of high-end simulations in a fault-tolerant manner on distributed resources. The H2O subsystem comprises the kernel of the Harness framework, and controls the key functions of resource management across multiple administrative domains, especially issues of access and allocation. It is based on a “pluggable” architecture that enables the aggregated use of distributed heterogeneous resources for high performance computing. The major contributions of the Harness II project result in significantly enhancing the overall computational productivity of high-end scientific applications by enabling robust, failure-resilient computations on cooperatively pooled resource collections.

Authors:
Publication Date:
Research Org.:
Emory University, Atlanta, GA
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
916972
Report Number(s):
DOE/ER/25537-1
TRN: US201006%%621
DOE Contract Number:
FG02-02ER25537
Resource Type:
Technical Report
Country of Publication:
United States
Language:
English
Subject:
99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE; ARCHITECTURE; KERNELS; PERFORMANCE; PRODUCTIVITY; RESOURCE MANAGEMENT; Distributed computing, fault tolerance, high performance computing

Citation Formats

Sunderam, Vaidy S. Cooperative fault-tolerant distributed computing U.S. Department of Energy Grant DE-FG02-02ER25537 Final Report. United States: N. p., 2007. Web. doi:10.2172/916972.
Sunderam, Vaidy S. Cooperative fault-tolerant distributed computing U.S. Department of Energy Grant DE-FG02-02ER25537 Final Report. United States. doi:10.2172/916972.
Sunderam, Vaidy S. Tue . "Cooperative fault-tolerant distributed computing U.S. Department of Energy Grant DE-FG02-02ER25537 Final Report". United States. doi:10.2172/916972. https://www.osti.gov/servlets/purl/916972.
@article{osti_916972,
title = {Cooperative fault-tolerant distributed computing U.S. Department of Energy Grant DE-FG02-02ER25537 Final Report},
author = {Sunderam, Vaidy S.},
abstractNote = {The Harness project has developed novel software frameworks for the execution of high-end simulations in a fault-tolerant manner on distributed resources. The H2O subsystem comprises the kernel of the Harness framework, and controls the key functions of resource management across multiple administrative domains, especially issues of access and allocation. It is based on a “pluggable” architecture that enables the aggregated use of distributed heterogeneous resources for high performance computing. The major contributions of the Harness II project result in significantly enhancing the overall computational productivity of high-end scientific applications by enabling robust, failure-resilient computations on cooperatively pooled resource collections.},
doi = {10.2172/916972},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Tue Jan 09 00:00:00 EST 2007},
month = {Tue Jan 09 00:00:00 EST 2007}
}

Technical Report:

Save / Share: