skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Impact of Burst Buffer Architectures on Application Portability

Abstract

The Oak Ridge and Argonne Leadership Computing Facilities are both receiving new systems under the Collaboration of Oak Ridge, Argonne, and Livermore (CORAL) program. Because they are both part of the INCITE program, applications need to be portable between these two facilities. However, the Summit and Aurora systems will be vastly different architectures, including their I/O subsystems. While both systems will have POSIX-compliant parallel file systems, their Burst Buffer technologies will be different. This difference may pose challenges to application portability between facilities. Application developers need to pay attention to specific burst buffer implementations to maximize code portability.

Authors:
 [1];  [2];  [2];  [2]
  1. Argonne National Lab. (ANL), Argonne, IL (United States)
  2. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). National Center for Computational Science
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF); Argonne National Lab. (ANL), Argonne, IL (United States)
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
1328312
Report Number(s):
ORNL/TM-2016/415
KJ0502000; ERKJZN1
DOE Contract Number:
AC05-00OR22725; AC02-06CH11357
Resource Type:
Technical Report
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING

Citation Formats

Harms, Kevin, Oral, H. Sarp, Atchley, Scott, and Vazhkudai, Sudharshan S. Impact of Burst Buffer Architectures on Application Portability. United States: N. p., 2016. Web. doi:10.2172/1328312.
Harms, Kevin, Oral, H. Sarp, Atchley, Scott, & Vazhkudai, Sudharshan S. Impact of Burst Buffer Architectures on Application Portability. United States. doi:10.2172/1328312.
Harms, Kevin, Oral, H. Sarp, Atchley, Scott, and Vazhkudai, Sudharshan S. 2016. "Impact of Burst Buffer Architectures on Application Portability". United States. doi:10.2172/1328312. https://www.osti.gov/servlets/purl/1328312.
@article{osti_1328312,
title = {Impact of Burst Buffer Architectures on Application Portability},
author = {Harms, Kevin and Oral, H. Sarp and Atchley, Scott and Vazhkudai, Sudharshan S.},
abstractNote = {The Oak Ridge and Argonne Leadership Computing Facilities are both receiving new systems under the Collaboration of Oak Ridge, Argonne, and Livermore (CORAL) program. Because they are both part of the INCITE program, applications need to be portable between these two facilities. However, the Summit and Aurora systems will be vastly different architectures, including their I/O subsystems. While both systems will have POSIX-compliant parallel file systems, their Burst Buffer technologies will be different. This difference may pose challenges to application portability between facilities. Application developers need to pay attention to specific burst buffer implementations to maximize code portability.},
doi = {10.2172/1328312},
journal = {},
number = ,
volume = ,
place = {United States},
year = 2016,
month = 9
}

Technical Report:

Save / Share:
  • Recent high-performance computing (HPC) platforms such as the Trinity Advanced Technology System (ATS-1) feature burst buffer resources that can have a dramatic impact on an application’s I/O performance. While these non-volatile memory (NVM) resources provide a new tier in the storage hierarchy, developers must find the right way to incorporate the technology into their applications in order to reap the benefits. Similar to other laboratories, Sandia is actively investigating ways in which these resources can be incorporated into our existing libraries and workflows without burdening our application developers with excessive, platform-specific details. This FY18Q1 milestone summaries our progress in adaptingmore » the Sandia Parallel Aerodynamics and Reentry Code (SPARC) in Sandia’s ATDM program to leverage Trinity’s burst buffers for checkpoint/restart operations. We investigated four different approaches with varying tradeoffs in this work: (1) simply updating job script to use stage-in/stage out burst buffer directives, (2) modifying SPARC to use LANL’s hierarchical I/O (HIO) library to store/retrieve checkpoints, (3) updating Sandia’s IOSS library to incorporate the burst buffer in all meshing I/O operations, and (4) modifying SPARC to use our Kelpie distributed memory library to store/retrieve checkpoints. Team members were successful in generating initial implementation for all four approaches, but were unable to obtain performance numbers in time for this report (reasons: initial problem sizes were not large enough to stress I/O, and SPARC refactor will require changes to our code). When we presented our work to the SPARC team, they expressed the most interest in the second and third approaches. The HIO work was favored because it is lightweight, unobtrusive, and should be portable to ATS-2. The IOSS work is seen as a long-term solution, and is favored because all I/O work (including checkpoints) can be deferred to a single library.« less
  • No abstract provided.
  • The most significant impact on research in Scientific Computation, and Numerical Linear Algebra in particular, seems to have been brought about by the advent of vector and parallel computation. This paper presents a short survey of recent work on parallel implementations of Numerical Linear Algebra algorithms with emphasis on those relating to the solution of the symmetric eigenvalue problem on loosely coupled multiprocessor architectures. A simple model given to analyze the complexity of parallel algorithms on several representative multiprocessor systems: a linear processor array (or ring), a two-dimensional processor grid, and the hypercube. The vital operations in the formulation ofmore » most eigenvalue algorithms are matrix vector multiplication, matrix transposition, and linear system solution. Their implementations on the above architectures are described, as well as parallel implementations of the following classes of eigenvalue methods: QR, bisection, divide-and-conquer, and Lanczos algorithm.« less