DRAGON: breaking GPU memory capacity limits with direct NVM access

Markthub, Pak; Belviranli, Mehmet E.; Lee, Seyong; Vetter, Jeffrey; Matsuoka, Satoshi

doi:10.1109/SC.2018.00035

DRAGON: breaking GPU memory capacity limits with direct NVM access

Conference · Thu Nov 01 00:00:00 EDT 2018

DOI:https://doi.org/10.1109/SC.2018.00035· OSTI ID:1489577

Markthub, Pak ^[1]; ^[2]; ^[2]; ^[2]; Matsuoka, Satoshi ^[3]

Tokyo Institute of Technology, Japan
ORNL
RIKEN Laboratory

Heterogeneous computing with accelerators is growing in importance in high performance computing (HPC). Recently, application datasets have expanded beyond the memory capacity of these accelerators, and often beyond the capacity of their hosts. Meanwhile, nonvolatile memory (NVM) storage has emerged as a pervasive component in HPC systems because NVM provides massive amounts of memory capacity at affordable cost. Currently, for accelerator applications to use NVM, they must manually orchestrate data movement across multiple memories and this approach only performs well for applications with simple access behaviors. To address this issue, we developed DRAGON, a solution that enables all classes of GP-GPU applications to transparently compute on terabyte datasets residing in NVM. DRAGON leverages the page-faulting mechanism on the recent NVIDIA GPUs by extending capabilities of CUDA Unified Memory (UM). Our experimental results show that DRAGON transparently expands memory capacity and obtain additional speedups via automated I/O and data transfer overlapping.

View Conference

Research Organization:: Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)

Sponsoring Organization:: USDOE

DOE Contract Number:: AC05-00OR22725

OSTI ID:: 1489577

Country of Publication:: United States

Language:: English

References (22)

vDNN: Virtualized deep neural networks for scalable, memory-efficient neural network design Rhu, Minsoo; Gimelshein, Natalia; Clemons, Jason 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) https://doi.org/10.1109/MICRO.2016.7783721	conference	October 2016
Dynamically managed data for CPU-GPU architectures Jablin, Thomas B.; Jablin, James A.; Prabhu, Prakash Proceedings of the Tenth International Symposium on Code Generation and Optimization - CHO '12 https://doi.org/10.1145/2259016.2259038	conference	January 2012
A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads Che, Shuai; Sheaffer, Jeremy W.; Boyer, Michael IEEE International Symposium on Workload Characterization (IISWC'10) https://doi.org/10.1109/IISWC.2010.5650274	conference	December 2010
Learning Spatiotemporal Features with 3D Convolutional Networks Tran, Du; Bourdev, Lubomir; Fergus, Rob 2015 IEEE International Conference on Computer Vision (ICCV) https://doi.org/10.1109/ICCV.2015.510	conference	December 2015
The future of scientific workflows Deelman, Ewa; Peterka, Tom; Altintas, Ilkay The International Journal of High Performance Computing Applications, Vol. 32, Issue 1 https://doi.org/10.1177/1094342017704893	journal	April 2017
A Survey of CPU-GPU Heterogeneous Computing Techniques Mittal, Sparsh; Vetter, Jeffrey S. ACM Computing Surveys, Vol. 47, Issue 4 https://doi.org/10.1145/2788396	journal	July 2015
Parallel Multivariate Spatio-Temporal Clustering of Large Ecological Datasets on Hybrid Supercomputers Sreepathi, Sarat; Kumar, Jitendra; Mills, Richard T. 2017 IEEE International Conference on Cluster Computing (CLUSTER) https://doi.org/10.1109/CLUSTER.2017.88	conference	September 2017
Umh Ziabari, Amir Kavyan; Sun, Yifan; Ma, Yenai ACM Transactions on Architecture and Code Optimization, Vol. 13, Issue 4 https://doi.org/10.1145/2996190	journal	December 2016
Opportunities for Nonvolatile Memory Systems in Extreme-Scale High-Performance Computing Vetter, Jeffrey S.; Mittal, Sparsh Computing in Science & Engineering, Vol. 17, Issue 2 https://doi.org/10.1109/MCSE.2015.4	journal	March 2015
GPUfs Silberstein, Mark; Ford, Bryan; Keidar, Idit Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems https://doi.org/10.1145/2451116.2451169	conference	March 2013
Scalable framework for mapping streaming applications onto multi-GPU systems Huynh, Huynh Phung; Hagiescu, Andrei; Wong, Weng-Fai Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming - PPoPP '12 https://doi.org/10.1145/2145816.2145818	conference	January 2012
Towards high performance paged memory for GPUs Zheng, Tianhao; Nellans, David; Zulfiqar, Arslan 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA) https://doi.org/10.1109/HPCA.2016.7446077	conference	March 2016
Large-scale distributed sorting for GPU-based heterogeneous supercomputers Shamoto, Hideyuki; Shirahata, Koichi; Drozd, Aleksandr 2014 IEEE International Conference on Big Data (Big Data) https://doi.org/10.1109/BigData.2014.7004268	conference	October 2014
A flash memory controller for 15μs ultra-low-latency SSD using high-speed 3D NAND flash with 3μs read time Cheong, Wooseong; Yoon, Chanho; Woo, Seonghoon 2018 IEEE International Solid-State Circuits Conference (ISSCC), 2018 IEEE International Solid - State Circuits Conference - (ISSCC) https://doi.org/10.1109/ISSCC.2018.8310322	conference	February 2018
Realizing Out-of-Core Stencil Computations Using Multi-tier Memory Hierarchy on GPGPU Clusters Endo, Toshio 2016 IEEE International Conference on Cluster Computing (CLUSTER) https://doi.org/10.1109/CLUSTER.2016.61	conference	September 2016
GPUdmm: A high-performance and memory-oblivious GPU architecture using dynamic memory management Kim, Youngsok; Lee, Jaewon; Jo, Jae-Eon 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA) https://doi.org/10.1109/HPCA.2014.6835963	conference	February 2014
Eventual consistency today Bailis, Peter; Ghodsi, Ali Communications of the ACM, Vol. 56, Issue 5 https://doi.org/10.1145/2447976.2447992	journal	May 2013
Morpheus Tseng, Hung-Wei; Zhao, Qianchen; Zhou, Yuxiao ACM SIGARCH Computer Architecture News, Vol. 44, Issue 3 https://doi.org/10.1145/3007787.3001143	journal	June 2016
ActivePointers Shahar, Sagi; Bergman, Shai; Silberstein, Mark ACM SIGARCH Computer Architecture News, Vol. 44, Issue 3 https://doi.org/10.1145/3007787.3001200	journal	June 2016
An asymmetric distributed shared memory model for heterogeneous parallel systems Gelado, Isaac; Cabezas, Javier; Navarro, Nacho Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems - ASPLOS '10 https://doi.org/10.1145/1736020.1736059	conference	January 2010
GStream: a graph streaming processing method for large-scale graphs on GPUs Seo, Hyunseok; Kim, Jinwook; Kim, Min-Soo Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming https://doi.org/10.1145/2688500.2688526	conference	January 2015
Scaling large-data computations on multi-GPU accelerators Sabne, Amit; Sakdhnagool, Putt; Eigenmann, Rudolf Proceedings of the 27th international ACM conference on International conference on supercomputing - ICS '13 https://doi.org/10.1145/2464996.2465023	conference	January 2013

Similar Records

PapyrusKV: A High-Performance Parallel Key-Value Store for Distributed NVM Architectures

Conference · Wed Nov 01 00:00:00 EDT 2017 · OSTI ID:1399547

PapyrusKV: a high-performance parallel key-value store for distributed NVM architectures

Conference · Sat Nov 11 23:00:00 EST 2017 · OSTI ID:1567466

Towards Enhancing Coding Productivity for GPU Programming Using Static Graphs

Journal Article · Wed Apr 20 00:00:00 EDT 2022 · Electronics · OSTI ID:1883753

DRAGON: breaking GPU memory capacity limits with direct NVM access

Citation Formats

References (22)

Similar Records

Related Subjects