skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Localized Fault Recovery for Nested Fork-Join Programs

Abstract

Nested fork-join programs scheduled using work stealing can automatically balance load and adapt to changes in the execution environment. In this paper, we design an approach to efficiently recover from faults encountered by these programs. Specifically, we focus on localized recovery of the task space in the presence of fail-stop failures. We present an approach to efficiently track, under work stealing, the relationships between the work executed by various threads. This information is used to identify and schedule the tasks to be re-executed without interfering with normal task execution. The algorithm precisely computes the work lost, incurs minimal re-execution overhead, and can recover from an arbitrary number of failures. Experimental evaluation demonstrates low overheads in the absence of failures, recovery overheads on the same order as the lost work, and much lower recovery costs than alternative strategies.

Authors:
; ;
Publication Date:
Research Org.:
Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1379446
Report Number(s):
PNNL-SA-123481
KJ0402000
DOE Contract Number:
AC05-76RL01830
Resource Type:
Conference
Resource Relation:
Conference: Proceedings of the 31st IEEE International Parallel & Distributed Processing Symposium (IPDPS 2017), May 29-June 2, 2017, Orlando, Florida, 397-408
Country of Publication:
United States
Language:
English

Citation Formats

Kestor, Gokcen, Krishnamoorthy, Sriram, and Ma, Wenjing. Localized Fault Recovery for Nested Fork-Join Programs. United States: N. p., 2017. Web. doi:10.1109/IPDPS.2017.75.
Kestor, Gokcen, Krishnamoorthy, Sriram, & Ma, Wenjing. Localized Fault Recovery for Nested Fork-Join Programs. United States. doi:10.1109/IPDPS.2017.75.
Kestor, Gokcen, Krishnamoorthy, Sriram, and Ma, Wenjing. Mon . "Localized Fault Recovery for Nested Fork-Join Programs". United States. doi:10.1109/IPDPS.2017.75.
@article{osti_1379446,
title = {Localized Fault Recovery for Nested Fork-Join Programs},
author = {Kestor, Gokcen and Krishnamoorthy, Sriram and Ma, Wenjing},
abstractNote = {Nested fork-join programs scheduled using work stealing can automatically balance load and adapt to changes in the execution environment. In this paper, we design an approach to efficiently recover from faults encountered by these programs. Specifically, we focus on localized recovery of the task space in the presence of fail-stop failures. We present an approach to efficiently track, under work stealing, the relationships between the work executed by various threads. This information is used to identify and schedule the tasks to be re-executed without interfering with normal task execution. The algorithm precisely computes the work lost, incurs minimal re-execution overhead, and can recover from an arbitrary number of failures. Experimental evaluation demonstrates low overheads in the absence of failures, recovery overheads on the same order as the lost work, and much lower recovery costs than alternative strategies.},
doi = {10.1109/IPDPS.2017.75},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Mon Jul 03 00:00:00 EDT 2017},
month = {Mon Jul 03 00:00:00 EDT 2017}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share:
  • We present an approach to improving data locality across different phases of fork/join programs scheduled using work stealing. The approach consists of: (1) user-specified and automated approaches to constructing a steal tree, the schedule of steal operations and (2) constrained work stealing algorithms that constrain the actions of the scheduler to mirror a given steal tree. These are combined to construct work stealing schedules that maximize data locality across computation phases while ensuring load balance within each phase. These algorithms are also used to demonstrate dynamic coarsening, an optimization to improve spatial locality and sequential overheads by combining many finer-grainedmore » tasks into coarser tasks while ensuring sufficient concurrency for locality-optimized load balance. Implementation and evaluation in Cilk demonstrate performance improvements of up to 2.5x on 80 cores. We also demonstrate that dynamic coarsening can combine the performance benefits of coarse task specification with the adaptability of finer tasks.« less
  • Field relations indicate multiple sequences of late Cenozoic basalt flowed down the canyon of the North Fork Feather River from the Modoc Plateau during the Pliocene and early Quaternary. Remnants of at least three flow sequences are exposed in the canyon, the intermediate one yielding a K/Ar plagioclase date of 1.8 Ma. Topographic profiling of the remnants allows identification of Quaternary tectonic deformation along the northern Plumas trench, which separates the Sierra Nevada from the Diamond Mountains. The authors have identified several vertical displacements of the 1.8-Ma unit in the North Fork canyon and the area NE of Lake Almanor.more » NE of the lake, three NW-striking faults, each having down-to-the-west displacements of up to 35 m, are related to faulting along the east side of the Almanor tectonic depression. Analysis of the displaced basalt flows suggests that uplift of the Sierra Nevada occurred with canyon development prior to 2 Ma, and has continued coincident with several subsequent episodes of basalt deposition. Quaternary faulting of the basalt is associated with the Melones fault zone and the Plumas trench where they extend northward from the northern Sierra Nevada into the Modoc Plateau and southern Cascades. In contrast to the Mohawk Valley area, where the Plumas trench forms a 5-km-wide graben, faulting in the Almanor region is distributed over a 15-km-wide zone. A change in the strike of faulting occurs at Lake Almanor, from N50W along the Plumas trench to N20W north of the lake. The right-slip component on the fault of the Plums trench may result in a releasing bend at the change in strike and explain the origin of the Almanor depression.« less
  • High-pressure metamorphic blocks within serpentine occur between the South Mtn Schist, Jsfm, (Pickett Peak terrane) of the Franciscan Complex and rocks of the Western Jurassic belt of the Klamaths in the Willow Creek Quadrangle. These rounded to ellipsoidal tectonic blocks range from 1--50 meters and have foliated chlorite [+-] actinolite rinds which presumably resulted from interaction between the mafic blocks and host serpentinite. Metamorphic assemblages within the blocks include lawsonite + glaucophane + chlorite + pumpellyite + albite and subsets of the above with omphacite [+-] quartz [+-] phengite [+-] epidote [+-] aragonite. Relic igneous augite is common, sometimes withmore » omphacite or glaucophane overgrowths. Veins of pumpellyite or fibrous to granular omphacite are rare and epidote + albite [+-] calcite veins are common. Whole-rock chemistry, initial [sup 87]Sr/[sup 86]Sr, and relic augite compositions indicate perhaps both island arc and ocean floor basalt protoliths. Whole rock [delta][sup 18]O range from 4.7--10.2 permil indicating both low- and high-temperature alteration of the protolith in a seafloor setting. These blocks are chemically, mineralogically, and texturally similar to tectonic blocks in the Central Franciscan belt and to metabasites in the Pickett Peak terrane to the south where the Coast Range fault places Jsfm against the Coast Range Ophiolite (CRO), yet omphacite-bearing tectonic blocks in serpentinite have not been reported anywhere else along the South Fork fault adjacent to the Klamath Mountains. These unique blocks are associated with a particularly wide zone of ultramafic rocks, at least some of which appear to have upper-plate Josephine Ophiolite affinity. However, the tectonic block-bearing serpentinite is intercalated by imbricate thrusting with Jsfm suggesting an affinity to the CRO.« less
  • Nested decomposition of linear programs is the result of a multilevel, hierarchical application of the Dantzig-Wolfe decomposition principle. The general structure is called lower block-triangular, and permits direct accounting of long-term effects of investment, service life, etc. LIFT, an algorithm for solving lower block triangular linear programs, is based on state-of-the-art modular LP software. The algorithmic and software aspects of LIFT are outlined, and computational results are presented. 5 figures, 6 tables. (RWR)
  • In this paper, the authors develop analytic models for a shared memory multiprocessor that executes fork-join parallel programs. Here a fork-join program is one that consists of a set of n {ge} 1 parallel tasks. All of the tasks of a program arrive simultaneously to the system and the job is assumed to complete when the last task completes. They develop and analyze models for two processor sharing policies, called task scheduling processor sharing and job scheduling processor sharing. The first policy schedules tasks independently of each other and allows parallel execution of an individual program, whereas the second policymore » schedules each job as a unit and thereby does not allow parallel execution of an individual program.« less