Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information
  1. The HARNESS Workbench: Unified and Adaptive Access to Diverse HPC Platforms

    The primary goal of the Harness WorkBench (HWB) project is to investigate innovative software environments that will help enhance the overall productivity of applications science on diverse HPC platforms. Two complementary frameworks were designed: one, a virtualized command toolkit for application building, deployment, and execution, that provides a common view across diverse HPC systems, in particular the DOE leadership computing platforms (Cray, IBM, SGI, and clusters); and two, a unified runtime environment that consolidates access to runtime services via an adaptive framework for execution-time and post processing activities. A prototype of the first was developed based on the concept of a 'system-call virtual machine' (SCVM), to enhance portability of the HPC application deployment process across heterogeneous high-end machines. The SCVM approach to portable builds is based on the insertion of toolkit-interpretable directives into original application build scripts. Modifications resulting from these directives preserve the semantics of the original build instruction flow. The execution of the build script is controlled by our toolkit that intercepts build script commands in a manner transparent to the end-user. We have applied this approach to a scientific production code (Gamess-US) on the Cray-XT5 machine. The second facet, termed Unibus, aims to facilitate provisioning and aggregation of multifaceted resources from resource providers and end-users perspectives. To achieve that, Unibus proposes a Capability Model and mediators (resource drivers) to virtualize access to diverse resources, and soft and successive conditioning to enable automatic and user-transparent resource provisioning. A proof of concept implementation has demonstrated the viability of this approach on high end machines, grid systems and computing clouds.

  2. Cooperative fault-tolerant distributed computing U.S. Department of Energy Grant DE-FG02-02ER25537 Final Report

    The Harness project has developed novel software frameworks for the execution of high-end simulations in a fault-tolerant manner on distributed resources. The H2O subsystem comprises the kernel of the Harness framework, and controls the key functions of resource management across multiple administrative domains, especially issues of access and allocation. It is based on a “pluggable” architecture that enables the aggregated use of distributed heterogeneous resources for high performance computing. The major contributions of the Harness II project result in significantly enhancing the overall computational productivity of high-end scientific applications by enabling robust, failure-resilient computations on cooperatively pooled resource collections.


Search for:
All Records
Author / Contributor
"Sunderam, Vaidy S"

Refine by:
Resource Type
Availability
Author / Contributor
Research Organization