DRAGON: breaking GPU memory capacity limits with direct NVM access
- Tokyo Institute of Technology, Japan
- ORNL
- RIKEN Laboratory
Heterogeneous computing with accelerators is growing in importance in high performance computing (HPC). Recently, application datasets have expanded beyond the memory capacity of these accelerators, and often beyond the capacity of their hosts. Meanwhile, nonvolatile memory (NVM) storage has emerged as a pervasive component in HPC systems because NVM provides massive amounts of memory capacity at affordable cost. Currently, for accelerator applications to use NVM, they must manually orchestrate data movement across multiple memories and this approach only performs well for applications with simple access behaviors. To address this issue, we developed DRAGON, a solution that enables all classes of GP-GPU applications to transparently compute on terabyte datasets residing in NVM. DRAGON leverages the page-faulting mechanism on the recent NVIDIA GPUs by extending capabilities of CUDA Unified Memory (UM). Our experimental results show that DRAGON transparently expands memory capacity and obtain additional speedups via automated I/O and data transfer overlapping.
- Research Organization:
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- AC05-00OR22725
- OSTI ID:
- 1489577
- Country of Publication:
- United States
- Language:
- English
| vDNN: Virtualized deep neural networks for scalable, memory-efficient neural network design 
 | conference | October 2016 | 
| Dynamically managed data for CPU-GPU architectures 
 | conference | January 2012 | 
| A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads 
 | conference | December 2010 | 
| Learning Spatiotemporal Features with 3D Convolutional Networks 
 | conference | December 2015 | 
| The future of scientific workflows 
 | journal | April 2017 | 
| A Survey of CPU-GPU Heterogeneous Computing Techniques 
 | journal | July 2015 | 
| Parallel Multivariate Spatio-Temporal Clustering of Large Ecological Datasets on Hybrid Supercomputers 
 | conference | September 2017 | 
| Umh 
 | journal | December 2016 | 
| Opportunities for Nonvolatile Memory Systems in Extreme-Scale High-Performance Computing 
 | journal | March 2015 | 
| GPUfs 
 | conference | March 2013 | 
| Scalable framework for mapping streaming applications onto multi-GPU systems 
 | conference | January 2012 | 
| Towards high performance paged memory for GPUs 
 | conference | March 2016 | 
| Large-scale distributed sorting for GPU-based heterogeneous supercomputers 
 | conference | October 2014 | 
| A flash memory controller for 15μs ultra-low-latency SSD using high-speed 3D NAND flash with 3μs read time 
 | conference | February 2018 | 
| Realizing Out-of-Core Stencil Computations Using Multi-tier Memory Hierarchy on GPGPU Clusters 
 | conference | September 2016 | 
| GPUdmm: A high-performance and memory-oblivious GPU architecture using dynamic memory management 
 | conference | February 2014 | 
| Eventual consistency today 
 | journal | May 2013 | 
| Morpheus 
 | journal | June 2016 | 
| ActivePointers 
 | journal | June 2016 | 
| An asymmetric distributed shared memory model for heterogeneous parallel systems 
 | conference | January 2010 | 
| GStream: a graph streaming processing method for large-scale graphs on GPUs 
 | conference | January 2015 | 
| Scaling large-data computations on multi-GPU accelerators 
 | conference | January 2013 | 
Similar Records
PapyrusKV: a high-performance parallel key-value store for distributed NVM architectures
Towards Enhancing Coding Productivity for GPU Programming Using Static Graphs