Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

DRAGON: breaking GPU memory capacity limits with direct NVM access

Conference ·

Heterogeneous computing with accelerators is growing in importance in high performance computing (HPC). Recently, application datasets have expanded beyond the memory capacity of these accelerators, and often beyond the capacity of their hosts. Meanwhile, nonvolatile memory (NVM) storage has emerged as a pervasive component in HPC systems because NVM provides massive amounts of memory capacity at affordable cost. Currently, for accelerator applications to use NVM, they must manually orchestrate data movement across multiple memories and this approach only performs well for applications with simple access behaviors. To address this issue, we developed DRAGON, a solution that enables all classes of GP-GPU applications to transparently compute on terabyte datasets residing in NVM. DRAGON leverages the page-faulting mechanism on the recent NVIDIA GPUs by extending capabilities of CUDA Unified Memory (UM). Our experimental results show that DRAGON transparently expands memory capacity and obtain additional speedups via automated I/O and data transfer overlapping.

Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
AC05-00OR22725
OSTI ID:
1489577
Country of Publication:
United States
Language:
English

References (22)

vDNN: Virtualized deep neural networks for scalable, memory-efficient neural network design conference October 2016
Dynamically managed data for CPU-GPU architectures conference January 2012
A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads conference December 2010
Learning Spatiotemporal Features with 3D Convolutional Networks conference December 2015
The future of scientific workflows journal April 2017
A Survey of CPU-GPU Heterogeneous Computing Techniques journal July 2015
Parallel Multivariate Spatio-Temporal Clustering of Large Ecological Datasets on Hybrid Supercomputers conference September 2017
Umh journal December 2016
Opportunities for Nonvolatile Memory Systems in Extreme-Scale High-Performance Computing journal March 2015
GPUfs
  • Silberstein, Mark; Ford, Bryan; Keidar, Idit
  • Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems https://doi.org/10.1145/2451116.2451169
conference March 2013
Scalable framework for mapping streaming applications onto multi-GPU systems
  • Huynh, Huynh Phung; Hagiescu, Andrei; Wong, Weng-Fai
  • Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming - PPoPP '12 https://doi.org/10.1145/2145816.2145818
conference January 2012
Towards high performance paged memory for GPUs conference March 2016
Large-scale distributed sorting for GPU-based heterogeneous supercomputers conference October 2014
A flash memory controller for 15μs ultra-low-latency SSD using high-speed 3D NAND flash with 3μs read time
  • Cheong, Wooseong; Yoon, Chanho; Woo, Seonghoon
  • 2018 IEEE International Solid-State Circuits Conference (ISSCC), 2018 IEEE International Solid - State Circuits Conference - (ISSCC) https://doi.org/10.1109/ISSCC.2018.8310322
conference February 2018
Realizing Out-of-Core Stencil Computations Using Multi-tier Memory Hierarchy on GPGPU Clusters conference September 2016
GPUdmm: A high-performance and memory-oblivious GPU architecture using dynamic memory management conference February 2014
Eventual consistency today journal May 2013
Morpheus journal June 2016
ActivePointers journal June 2016
An asymmetric distributed shared memory model for heterogeneous parallel systems
  • Gelado, Isaac; Cabezas, Javier; Navarro, Nacho
  • Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems - ASPLOS '10 https://doi.org/10.1145/1736020.1736059
conference January 2010
GStream: a graph streaming processing method for large-scale graphs on GPUs conference January 2015
Scaling large-data computations on multi-GPU accelerators
  • Sabne, Amit; Sakdhnagool, Putt; Eigenmann, Rudolf
  • Proceedings of the 27th international ACM conference on International conference on supercomputing - ICS '13 https://doi.org/10.1145/2464996.2465023
conference January 2013

Similar Records

PapyrusKV: A High-Performance Parallel Key-Value Store for Distributed NVM Architectures
Conference · Wed Nov 01 00:00:00 EDT 2017 · OSTI ID:1399547

PapyrusKV: a high-performance parallel key-value store for distributed NVM architectures
Conference · Sat Nov 11 23:00:00 EST 2017 · OSTI ID:1567466

Towards Enhancing Coding Productivity for GPU Programming Using Static Graphs
Journal Article · Wed Apr 20 00:00:00 EDT 2022 · Electronics · OSTI ID:1883753

Related Subjects