skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Hierarchical resilience with lightweight threads.

Abstract

This paper proposes methodology for providing robustness and resilience for a highly threaded distributed- and shared-memory environment based on well-defined inputs and outputs to lightweight tasks. These inputs and outputs form a failure 'barrier', allowing tasks to be restarted or duplicated as necessary. These barriers must be expanded based on task behavior, such as communication between tasks, but do not prohibit any given behavior. One of the trends in high-performance computing codes seems to be a trend toward self-contained functions that mimic functional programming. Software designers are trending toward a model of software design where their core functions are specified in side-effect free or low-side-effect ways, wherein the inputs and outputs of the functions are well-defined. This provides the ability to copy the inputs to wherever they need to be - whether that's the other side of the PCI bus or the other side of the network - do work on that input using local memory, and then copy the outputs back (as needed). This design pattern is popular among new distributed threading environment designs. Such designs include the Barcelona STARS system, distributed OpenMP systems, the Habanero-C and Habanero-Java systems from Vivek Sarkar at Rice University, the HPX/ParalleX model frommore » LSU, as well as our own Scalable Parallel Runtime effort (SPR) and the Trilinos stateless kernels. This design pattern is also shared by CUDA and several OpenMP extensions for GPU-type accelerators (e.g. the PGI OpenMP extensions).« less

Authors:
Publication Date:
Research Org.:
Sandia National Laboratories
Sponsoring Org.:
USDOE
OSTI Identifier:
1029809
Report Number(s):
SAND2011-7604
TRN: US1200049
DOE Contract Number:  
AC04-94AL85000
Resource Type:
Technical Report
Country of Publication:
United States
Language:
English
Subject:
43 PARTICLE ACCELERATORS; ACCELERATORS; COMMUNICATIONS; DESIGN; FUNCTIONALS; KERNELS; PROGRAMMING

Citation Formats

Wheeler, Kyle Bruce. Hierarchical resilience with lightweight threads.. United States: N. p., 2011. Web. doi:10.2172/1029809.
Wheeler, Kyle Bruce. Hierarchical resilience with lightweight threads.. United States. doi:10.2172/1029809.
Wheeler, Kyle Bruce. Sat . "Hierarchical resilience with lightweight threads.". United States. doi:10.2172/1029809. https://www.osti.gov/servlets/purl/1029809.
@article{osti_1029809,
title = {Hierarchical resilience with lightweight threads.},
author = {Wheeler, Kyle Bruce},
abstractNote = {This paper proposes methodology for providing robustness and resilience for a highly threaded distributed- and shared-memory environment based on well-defined inputs and outputs to lightweight tasks. These inputs and outputs form a failure 'barrier', allowing tasks to be restarted or duplicated as necessary. These barriers must be expanded based on task behavior, such as communication between tasks, but do not prohibit any given behavior. One of the trends in high-performance computing codes seems to be a trend toward self-contained functions that mimic functional programming. Software designers are trending toward a model of software design where their core functions are specified in side-effect free or low-side-effect ways, wherein the inputs and outputs of the functions are well-defined. This provides the ability to copy the inputs to wherever they need to be - whether that's the other side of the PCI bus or the other side of the network - do work on that input using local memory, and then copy the outputs back (as needed). This design pattern is popular among new distributed threading environment designs. Such designs include the Barcelona STARS system, distributed OpenMP systems, the Habanero-C and Habanero-Java systems from Vivek Sarkar at Rice University, the HPX/ParalleX model from LSU, as well as our own Scalable Parallel Runtime effort (SPR) and the Trilinos stateless kernels. This design pattern is also shared by CUDA and several OpenMP extensions for GPU-type accelerators (e.g. the PGI OpenMP extensions).},
doi = {10.2172/1029809},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2011},
month = {10}
}

Technical Report:

Save / Share: