MemHC: An Optimized GPU Memory Management Framework for Accelerating Many-body Correlation
Journal Article
·
· ACM Transactions on Architecture and Code Optimization
- College of William and Mary, Williamsburg, VA (United States)
- Thomas Jefferson National Accelerator Facility (TJNAF), Newport News, VA (United States)
The many-body correlation function is a fundamental computation kernel in modern physics computing applications, e.g., Hadron Contractions in Lattice quantum chromodynamics (QCD). This kernel is both computation and memory intensive, involving a series of tensor contractions, and thus usually runs on accelerators like GPUs. Existing optimizations on many-body correlation mainly focus on individual tensor contractions (e.g., cuBLAS libraries and others). In contrast, this work discovers a new optimization dimension for many-body correlation by exploring the optimization opportunities among tensor contractions. More specifically, it targets general GPU architectures (both NVIDIA and AMD) and optimizes many-body correlation’s memory management by exploiting a set of memory allocation and communication redundancy elimination opportunities: first, GPU memory allocation redundancy: the intermediate output frequently occurs as input in the subsequent calculations; second, CPU-GPU communication redundancy: although all tensors are allocated on both CPU and GPU, many of them are used (and reused) on the GPU side only, and thus, many CPU/GPU communications (like that in existing Unified Memory designs) are unnecessary; third, GPU oversubscription: limited GPU memory size causes oversubscription issues, and existing memory management usually results in near-reuse data eviction, thus incurring extra CPU/GPU memory communications.
- Research Organization:
- Thomas Jefferson National Accelerator Facility, Newport News, VA (United States)
- Sponsoring Organization:
- NSF; USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR); USDOE Office of Science (SC), Nuclear Physics (NP)
- Grant/Contract Number:
- AC05-06OR23177
- OSTI ID:
- 1867362
- Report Number(s):
- DOE/OR/23177-5487; JLAB-CST-22-3602; CCF-2047516; DE-AC05-06OR23177; 17-SC-20-SC
- Journal Information:
- ACM Transactions on Architecture and Code Optimization, Journal Name: ACM Transactions on Architecture and Code Optimization Journal Issue: 2 Vol. 19; ISSN 1544-3566
- Publisher:
- Association for Computing Machinery (ACM)Copyright Statement
- Country of Publication:
- United States
- Language:
- English
Performance evaluation of Unified Memory with prefetching and oversubscription for selected parallel CUDA applications on NVIDIA Pascal and Volta GPUs
|
journal | August 2019 |
GPU implementations of some many-body potentials for molecular dynamics simulations
|
journal | September 2017 |
Efficient GPU-accelerated molecular dynamics simulation of solid covalent crystals
|
journal | May 2013 |
A high performance data parallel tensor contraction framework: Application to coupled electro-mechanics
|
journal | July 2017 |
Two-nucleon higher partial-wave scattering from lattice QCD
|
journal | February 2017 |
High-performance Tensor Contractions for GPUs
|
journal | January 2016 |
Hadronic molecules
|
journal | February 2018 |
An Evaluation of Unified Memory Technology on NVIDIA GPUs
|
conference | May 2015 |
Optimized Non-contiguous MPI Datatype Communication for GPU Clusters: Design, Implementation and Evaluation with MVAPICH2
|
conference | September 2011 |
Enabling Fast, Noncontiguous GPU Data Movement in Hybrid MPI+GPU Environments
|
conference | September 2012 |
HARENS: Hardware Accelerated Redundancy Elimination in Network Systems
|
conference | December 2016 |
An investigation of Unified Memory Access performance in CUDA
|
conference | September 2014 |
Tensor Contractions with Extended BLAS Kernels on CPU and GPU
|
conference | December 2016 |
HEALS: A Parallel eALS Recommendation System on CPU/GPU Heterogeneous Platforms
|
conference | December 2021 |
An overview of modern cache memory and performance analysis of replacement policies
|
conference | March 2016 |
Analyzing and Leveraging Remote-Core Bandwidth for Enhanced Performance in GPUs
|
conference | September 2019 |
Processing MPI Derived Datatypes on Noncontiguous GPU-Resident Data
|
journal | October 2014 |
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA
|
conference | January 2008 |
The LRU-K page replacement algorithm for database disk buffering
|
journal | June 1993 |
Analyzing memory management methods on integrated CPU-GPU systems
|
conference | June 2017 |
Benchmarking and Evaluating Unified Memory for OpenMP GPU Offloading
|
conference | January 2017 |
Compiler assisted hybrid implicit and explicit GPU memory management under unified address space
|
conference | November 2019 |
Analytical cache modeling and tilesize optimization for tensor contractions
|
conference | November 2019 |
A Framework for Memory Oversubscription Management in Graphics Processing Units
|
conference | April 2019 |
Batch-Aware Unified Memory Management in GPUs for Irregular Workloads
|
conference | March 2020 |
PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning
|
conference | March 2020 |
BATS: A Spectral Biclustering Approach to Single Document Topic Modeling and Segmentation
|
journal | October 2021 |
Evaluating Multicore Algorithms on the Unified Memory Model
|
journal | January 2009 |
Similar Records
Efficient Parallelization of Irregular Applications on GPU Architectures
MICCO: An Enhanced Multi-GPU Scheduling Framework for Many-Body Correlation Functions
An efficient tensor transpose algorithm for multicore CPU, Intel Xeon Phi, and NVidia Tesla GPU
Thesis/Dissertation
·
Sun Dec 31 23:00:00 EST 2023
·
OSTI ID:2349242
MICCO: An Enhanced Multi-GPU Scheduling Framework for Many-Body Correlation Functions
Conference
·
Sun May 01 00:00:00 EDT 2022
· 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
·
OSTI ID:1886910
An efficient tensor transpose algorithm for multicore CPU, Intel Xeon Phi, and NVidia Tesla GPU
Journal Article
·
Sun Jan 04 19:00:00 EST 2015
· Computer Physics Communications
·
OSTI ID:1185465