Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Optimizing Data Movement for GPU-Based In-Situ Workflow Using GPUDirect RDMA

Conference ·
 [1];  [1];  [2];  [3];  [4];  [1]
  1. University of Utah
  2. Sandia National Laboratories (SNL)
  3. Texas Advanced Computing Center
  4. ORNL

The extreme-scale computing landscape is increasingly dominated by GPU-accelerated systems. At the same time, in-situ workflows that employ memory-to-memory inter-application data exchanges have emerged as an effective approach for leveraging these extreme-scale systems. In the case of GPUs, GPUDirect RDMA enables third-party devices, such as network interface cards, to access GPU memory directly and has been adopted for intra-application communications across GPUs. In this paper, we present an interoperable framework for GPU-based in-situ workflows that optimizes data movement using GPUDirect RDMA. Specifically, we analyze the characteristics of the possible data movement pathways between GPUs from an in-situ workflow perspective, and design a strategy that maximizes throughput. Furthermore, we implement this approach as an extension of the DataSpaces data staging service, and experimentally evaluate its performance and scalability on a current leadership GPU cluster. The performance results show that the proposed design reduces data-movement time by up to 53% and 40% for the sender and receiver, respectively, and maintains excellent scalability for up to 256 GPUs.

Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE; USDOE Office of Science (SC)
DOE Contract Number:
AC05-00OR22725
OSTI ID:
2000374
Country of Publication:
United States
Language:
English

References (17)

Designing efficient small message transfer mechanism for inter-node MPI communication on InfiniBand GPU clusters conference December 2014
Landrush: Rethinking In-Situ Analysis for GPGPU Workflows conference May 2016
ADIOS 2: The Adaptable Input Output System. A framework for high-performance data management journal July 2020
Efficient Inter-node MPI Communication Using GPUDirect RDMA for InfiniBand Clusters with NVIDIA GPUs conference October 2013
Kokkos 3: Programming Model Extensions for the Exascale Era journal January 2021
Big data and extreme-scale computing: Pathways to Convergence-Toward a shaping strategy for a future software and data ecosystem for scientific inquiry journal July 2018
The Tensions of In Situ Visualization journal March 2016
Assembling Portable In-Situ Workflow from Heterogeneous Components using Data Reorganization conference May 2022
Sycl 2020 conference April 2020
RAJA: Portable Performance for Large-Scale Scientific Applications conference November 2019
Mochi: Composing Data Services for High-Performance Computing Environments journal January 2020
MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters journal April 2011
Fixed-Rate Compressed Floating-Point Arrays journal December 2014
LULESH 2.0 Updates and Changes report July 2013
DataSpaces: an interaction and coordination framework for coupled simulation workflows conference January 2010
Increasing Scientific Data Insights about Exascale Class Simulations under Power and Storage Constraints journal March 2015
Loosely Coupled In Situ Visualization: A Perspective on Why It's Here to Stay
  • Kress, James; Klasky, Scott; Podhorszki, Norbert
  • Proceedings of the First Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization - ISAV2015 https://doi.org/10.1145/2828612.2828623
conference January 2015

Similar Records

Dual Channel Dual Staging: Hierarchical and Portable Staging for GPU-Based In-Situ Workflow
Conference · 2024 · OSTI ID:2538207

Tartan: Evaluating Modern GPU Interconnect via a Multi-GPU Benchmark Suite
Conference · 2018 · OSTI ID:1511696

CoREC: Scalable and Resilient In-memory Data Staging for In-situ Workflows
Journal Article · 2020 · ACM Transactions on Parallel Computing · OSTI ID:1769940

Related Subjects