skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: ActiveSpaces: Exploring dynamic code deployment for extreme scale data processing: ActiveSpaces: Exploring dynamic code deployment for extreme scale data processing

Journal Article · · Concurrency and Computation. Practice and Experience
DOI:https://doi.org/10.1002/cpe.3407· OSTI ID:1559755

Managing the large volumes of data produced by emerging scientific and engineering simulations running on leadership-class resources has become a critical challenge. The data have to be extracted off the computing nodes and transported to consumer nodes so that it can be processed, analyzed, visualized, archived, and so on. Several recent research efforts have addressed data-related challenges at different levels. One attractive approach is to offload expensive input/output operations to a smaller set of dedicated computing nodes known as a staging area. However, even using this approach, the data still have to be moved from the staging area to consumer nodes for processing, which continues to be a bottleneck. In this paper, we investigate an alternate approach, namely moving the data-processing code to the staging area instead of moving the data to the data-processing code. Specifically, we describe the ActiveSpaces framework, which provides (1) programming support for defining the data-processing routines to be downloaded to the staging area and (2) runtime mechanisms for transporting codes associated with these routines to the staging area, executing the routines on the nodes that are part of the staging area, and returning the results. We also present an experimental performance evaluation of ActiveSpaces using applications running on the Cray XT5 at Oak Ridge National Laboratory. Finally, we use a coupled fusion application workflow to explore the trade-offs between transporting data and transporting the code required for data processing during coupling, and we characterize sweet spots for each option.

Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR); National Science Foundation (NSF); USDOE Office of Science (SC), Fusion Energy Sciences (FES)
Contributing Organization:
L
Grant/Contract Number:
AC05-00OR22725; SC0007455; FG02-06ER54857; ACI 1339036; ACI 1310283; DMS 1228203; IIP 0758566
OSTI ID:
1559755
Alternate ID(s):
OSTI ID: 1401825
Journal Information:
Concurrency and Computation. Practice and Experience, Vol. 27, Issue 14; ISSN 1532-0626
Publisher:
WileyCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 8 works
Citation information provided by
Web of Science

References (17)

A New Flexible MPI Collective I/O Implementation conference September 2006
Addressing the petascale data challenge using in-situ analytics conference January 2011
Experiments with in-transit processing for data intensive grid workflows conference September 2007
PreDatA – preparatory data analytics on peta-scale machines conference April 2010
GPU Computing journal May 2008
CellSs: a Programming Model for the Cell BE Architecture conference November 2006
An API for Runtime Code Patching journal November 2000
Hello ADIOS: the challenges and lessons of developing leadership class I/O frameworks: HELLO ADIOS journal August 2013
Plasma simulation studies using multilevel physics models journal May 1999
DataStager: scalable data staging services for petascale applications conference January 2009
Managing Variability in the IO Performance of Petascale Storage Systems
  • Lofstead, Jay; Zheng, Fang; Liu, Qing
  • 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2010.32
conference November 2010
Numerical study of neoclassical plasma pedestal in a tokamak geometry journal May 2004
DataSpaces: an interaction and coordination framework for coupled simulation workflows journal February 2011
Evaluation of active storage strategies for the lustre parallel file system conference January 2007
DataSpaces: an interaction and coordination framework for coupled simulation workflows conference January 2010
Active disks for large-scale data processing journal June 2001
Enabling high-speed asynchronous data extraction and transfer using DART journal January 2010

Figures / Tables (22)