IMPACC: A Tightly Integrated MPI+OpenACC Framework Exploiting Shared Memory Parallelism
- ORNL
We propose IMPACC, an MPI+OpenACC framework for heterogeneous accelerator clusters. IMPACC tightly integrates MPI and OpenACC, while exploiting the shared memory parallelism in the target system. IMPACC dynamically adapts the input MPI+OpenACC applications on the target heterogeneous accelerator clusters to fully exploit target system-specific features. IMPACC provides the programmers with the unified virtual address space, automatic NUMA-friendly task-device mapping, efficient integrated communication routines, seamless streamlining of asynchronous executions, and transparent memory sharing. We have implemented IMPACC and evaluated its performance using three heterogeneous accelerator systems, including Titan supercomputer. Results show that IMPACC can achieve easier programming, higher performance, and better scalability than the current MPI+OpenACC model.
- Research Organization:
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)
- Sponsoring Organization:
- USDOE Office of Science (SC)
- DOE Contract Number:
- AC05-00OR22725
- OSTI ID:
- 1261555
- Resource Relation:
- Conference: ACM International Symposium on High-Performance Parallel and Distributed Computing - Kyoto, , Japan - 5/31/2016 12:00:00 AM-6/4/2016 12:00:00 AM
- Country of Publication:
- United States
- Language:
- English
Hybrid MPI: efficient message passing for multi-core systems
|
conference | January 2013 |
Cashmere: Heterogeneous Many-Core Computing
|
conference | May 2015 |
Enabling MPI interoperability through flexible communication endpoints
|
conference | January 2013 |
Fine-Grained Multithreading Support for Hybrid Threaded MPI Programming
|
journal | February 2010 |
Enabling CUDA acceleration within virtual machines using rCUDA
|
conference | December 2011 |
KNEM: A generic and scalable kernel-assisted intra-node MPI communication framework
|
journal | February 2013 |
MT-MPI: multithreaded MPI for many-core environments
|
conference | January 2014 |
A hybrid approach of OpenMP for clusters
|
conference | January 2012 |
Performance characterization of the NAS Parallel Benchmarks in OpenCL
|
conference | November 2011 |
OmpSs-OpenCL Programming Model for Heterogeneous Systems
|
book | January 2013 |
OpenARC: open accelerator research compiler for directive-based, efficient heterogeneous computing
|
conference | January 2014 |
Performance characteristics of hybrid MPI/OpenMP implementations of NAS parallel benchmarks SP and BT on large-scale multicore supercomputers
|
journal | March 2011 |
Early evaluation of directive-based GPU programming models for productive exascale computing
|
conference | November 2012 |
An Extension of the StarSs Programming Model for Platforms with Multiple GPUs
|
book | January 2009 |
Productive Programming of GPU Clusters with OmpSs
|
conference | May 2012 |
MPI versus MPI+OpenMP on the IBM SP for the NAS Benchmarks
|
conference | January 2000 |
Performance comparison of MPI and three openMP programming styles on shared memory multiprocessors
|
conference | January 2003 |
StarPU-MPI: Task Programming over Clusters of Machines Enhanced with Accelerators
|
book | January 2012 |
Portable performance on heterogeneous architectures
|
conference | January 2013 |
MVAPICH-PRISM: a proxy-based communication framework using InfiniBand and SCIF for intel MIC clusters
|
conference | January 2013 |
CellSs: a Programming Model for the Cell BE Architecture
|
conference | November 2006 |
SnuCL: an OpenCL framework for heterogeneous CPU/GPU clusters
|
conference | January 2012 |
The NAS parallel benchmarks---summary and preliminary results
|
conference | January 1991 |
An asymmetric distributed shared memory model for heterogeneous parallel systems
|
conference | January 2010 |
Compile/run-time support for threaded MPI execution on multiprogrammed shared memory machines
|
journal | August 1999 |
LibWater: heterogeneous distributed computing made easy
|
conference | January 2013 |
Beacon: Exploring the Deployment and Application of Intel Xeon Phi Coprocessors for Scientific Computing
|
journal | March 2015 |
Performance Implications of Nonuniform Device Topologies in Scalable Heterogeneous Architectures
|
journal | September 2011 |
A Unified Programming Model for Intra- and Inter-Node Offloading on Xeon Phi Clusters
|
conference | November 2014 |
MPI-ACC: An Integrated and Extensible Approach to Data Movement in Accelerator-based Systems
|
conference | June 2012 |
Ownership passing: efficient distributed memory programming on multi-core systems
|
conference | January 2013 |
Unified Development for Mixed Multi-GPU and Multi-coprocessor Environments Using a Lightweight Runtime Environment
|
conference | May 2014 |
Hybrid MPI/OpenMP Parallel Programming on Clusters of Multi-Core SMP Nodes
|
conference | February 2009 |
Synchronization and communication in the T3E multiprocessor
|
conference | January 1996 |
Accelerator: using data parallelism to program GPUs for general-purpose uses
|
conference | January 2006 |
MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters
|
journal | April 2011 |
Similar Records
Improved MPI collectives for MPI processes in shared address spaces
Parallel Multivariate Spatio-Temporal Clustering of Large Ecological Datasets on Hybrid Supercomputers