skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Harnessing the Power of Many: Extensible Toolkit for Scalable Ensemble Applications

Abstract

Many scientific problems require multiple distinct computational tasks to be executed in order to achieve a desired solution. We introduce the Ensemble Toolkit (EnTK) to address the challenges of scale, diversity and reliability they pose. We describe the design and implementation of EnTK, characterize its performance and integrate it with two exemplar use cases: seismic inversion and adaptive analog ensembles. We perform nine experiments, characterizing EnTK overheads, strong and weak scalability, and the performance of the two use case imple-mentations, at scale and on production infrastructures. We show how EnTK meets the following general requirements: (i) imple-menting dedicated abstractions to support the description and execution of ensemble applications; (ii) support for execution on heterogeneous computing infrastructures; (iii) efficient scalability up to O(10 4 ) tasks; and (iv) task-level fault tolerance. We discuss novel computational capabilities that EnTK enables and the scientific advantages arising thereof. We propose EnTK as an important addition to the suite of tools in support of production scientific computing.

Authors:
; ; ; ; ; ; ; ;
Publication Date:
Research Org.:
UT-Battelle LLC/ORNL, Oak Ridge, TN (Unted States); Univ. of Texas, Arlington, TX (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1567488
DOE Contract Number:  
AC05-00OR22725; SC0016280
Resource Type:
Conference
Journal Name:
2018 32ND IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS)
Additional Journal Information:
Conference: 2018 32ND IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), Vancouver, BC, Canada, 21-25 May, 2018.
Country of Publication:
United States
Language:
English
Subject:
Computer Science; Engineering

Citation Formats

Balasubramanian, Vivek, Turilli, Matteo, Hu, Weiming, Lefebvre, Matthieu, Lei, Wenjie, Modrak, Ryan, Cervone, Guido, Tromp, Jeroen, and Jha, Shantenu. Harnessing the Power of Many: Extensible Toolkit for Scalable Ensemble Applications. United States: N. p., 2018. Web. doi:10.1109/IPDPS.2018.00063.
Balasubramanian, Vivek, Turilli, Matteo, Hu, Weiming, Lefebvre, Matthieu, Lei, Wenjie, Modrak, Ryan, Cervone, Guido, Tromp, Jeroen, & Jha, Shantenu. Harnessing the Power of Many: Extensible Toolkit for Scalable Ensemble Applications. United States. doi:10.1109/IPDPS.2018.00063.
Balasubramanian, Vivek, Turilli, Matteo, Hu, Weiming, Lefebvre, Matthieu, Lei, Wenjie, Modrak, Ryan, Cervone, Guido, Tromp, Jeroen, and Jha, Shantenu. Tue . "Harnessing the Power of Many: Extensible Toolkit for Scalable Ensemble Applications". United States. doi:10.1109/IPDPS.2018.00063.
@article{osti_1567488,
title = {Harnessing the Power of Many: Extensible Toolkit for Scalable Ensemble Applications},
author = {Balasubramanian, Vivek and Turilli, Matteo and Hu, Weiming and Lefebvre, Matthieu and Lei, Wenjie and Modrak, Ryan and Cervone, Guido and Tromp, Jeroen and Jha, Shantenu},
abstractNote = {Many scientific problems require multiple distinct computational tasks to be executed in order to achieve a desired solution. We introduce the Ensemble Toolkit (EnTK) to address the challenges of scale, diversity and reliability they pose. We describe the design and implementation of EnTK, characterize its performance and integrate it with two exemplar use cases: seismic inversion and adaptive analog ensembles. We perform nine experiments, characterizing EnTK overheads, strong and weak scalability, and the performance of the two use case imple-mentations, at scale and on production infrastructures. We show how EnTK meets the following general requirements: (i) imple-menting dedicated abstractions to support the description and execution of ensemble applications; (ii) support for execution on heterogeneous computing infrastructures; (iii) efficient scalability up to O(10 4 ) tasks; and (iv) task-level fault tolerance. We discuss novel computational capabilities that EnTK enables and the scientific advantages arising thereof. We propose EnTK as an important addition to the suite of tools in support of production scientific computing.},
doi = {10.1109/IPDPS.2018.00063},
journal = {2018 32ND IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS)},
number = ,
volume = ,
place = {United States},
year = {2018},
month = {5}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: