skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: IMPACC: A Tightly Integrated MPI+OpenACC Framework Exploiting Shared Memory Parallelism

Conference ·

We propose IMPACC, an MPI+OpenACC framework for heterogeneous accelerator clusters. IMPACC tightly integrates MPI and OpenACC, while exploiting the shared memory parallelism in the target system. IMPACC dynamically adapts the input MPI+OpenACC applications on the target heterogeneous accelerator clusters to fully exploit target system-specific features. IMPACC provides the programmers with the unified virtual address space, automatic NUMA-friendly task-device mapping, efficient integrated communication routines, seamless streamlining of asynchronous executions, and transparent memory sharing. We have implemented IMPACC and evaluated its performance using three heterogeneous accelerator systems, including Titan supercomputer. Results show that IMPACC can achieve easier programming, higher performance, and better scalability than the current MPI+OpenACC model.

Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)
Sponsoring Organization:
USDOE Office of Science (SC)
DOE Contract Number:
AC05-00OR22725
OSTI ID:
1261555
Resource Relation:
Conference: ACM International Symposium on High-Performance Parallel and Distributed Computing - Kyoto, , Japan - 5/31/2016 12:00:00 AM-6/4/2016 12:00:00 AM
Country of Publication:
United States
Language:
English

References (36)

Hybrid MPI: efficient message passing for multi-core systems
  • Friedley, Andrew; Bronevetsky, Greg; Hoefler, Torsten
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13 https://doi.org/10.1145/2503210.2503294
conference January 2013
Cashmere: Heterogeneous Many-Core Computing conference May 2015
Enabling MPI interoperability through flexible communication endpoints conference January 2013
Fine-Grained Multithreading Support for Hybrid Threaded MPI Programming journal February 2010
Enabling CUDA acceleration within virtual machines using rCUDA conference December 2011
KNEM: A generic and scalable kernel-assisted intra-node MPI communication framework journal February 2013
MT-MPI: multithreaded MPI for many-core environments conference January 2014
A hybrid approach of OpenMP for clusters
  • Kwon, Okwan; Jubair, Fahed; Eigenmann, Rudolf
  • Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming - PPoPP '12 https://doi.org/10.1145/2145816.2145827
conference January 2012
Performance characterization of the NAS Parallel Benchmarks in OpenCL conference November 2011
OmpSs-OpenCL Programming Model for Heterogeneous Systems book January 2013
OpenARC: open accelerator research compiler for directive-based, efficient heterogeneous computing conference January 2014
Performance characteristics of hybrid MPI/OpenMP implementations of NAS parallel benchmarks SP and BT on large-scale multicore supercomputers journal March 2011
Early evaluation of directive-based GPU programming models for productive exascale computing
  • Lee, Seyong; Vetter, Jeffrey S.
  • 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2012.51
conference November 2012
An Extension of the StarSs Programming Model for Platforms with Multiple GPUs book January 2009
Productive Programming of GPU Clusters with OmpSs
  • Bueno, Javier; Planas, Judit; Duran, Alejandro
  • 2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2012 IEEE 26th International Parallel and Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2012.58
conference May 2012
MPI versus MPI+OpenMP on the IBM SP for the NAS Benchmarks conference January 2000
Performance comparison of MPI and three openMP programming styles on shared memory multiprocessors conference January 2003
StarPU-MPI: Task Programming over Clusters of Machines Enhanced with Accelerators book January 2012
Portable performance on heterogeneous architectures
  • Phothilimthana, Phitchaya Mangpo; Ansel, Jason; Ragan-Kelley, Jonathan
  • Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems - ASPLOS '13 https://doi.org/10.1145/2451116.2451162
conference January 2013
MVAPICH-PRISM: a proxy-based communication framework using InfiniBand and SCIF for intel MIC clusters
  • Potluri, Sreeram; Bureddy, Devendar; Hamidouche, Khaled
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13 https://doi.org/10.1145/2503210.2503288
conference January 2013
CellSs: a Programming Model for the Cell BE Architecture conference November 2006
SnuCL: an OpenCL framework for heterogeneous CPU/GPU clusters conference January 2012
The NAS parallel benchmarks---summary and preliminary results conference January 1991
An asymmetric distributed shared memory model for heterogeneous parallel systems
  • Gelado, Isaac; Cabezas, Javier; Navarro, Nacho
  • Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems - ASPLOS '10 https://doi.org/10.1145/1736020.1736059
conference January 2010
Compile/run-time support for threaded MPI execution on multiprogrammed shared memory machines journal August 1999
LibWater: heterogeneous distributed computing made easy
  • Grasso, Ivan; Pellegrini, Simone; Cosenza, Biagio
  • Proceedings of the 27th international ACM conference on International conference on supercomputing - ICS '13 https://doi.org/10.1145/2464996.2465008
conference January 2013
Beacon: Exploring the Deployment and Application of Intel Xeon Phi Coprocessors for Scientific Computing journal March 2015
Performance Implications of Nonuniform Device Topologies in Scalable Heterogeneous Architectures journal September 2011
A Unified Programming Model for Intra- and Inter-Node Offloading on Xeon Phi Clusters
  • Noack, Matthias; Wende, Florian; Steinke, Thomas
  • SC14: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2014.22
conference November 2014
MPI-ACC: An Integrated and Extensible Approach to Data Movement in Accelerator-based Systems
  • Aji, Ashwin M.; Dinan, James; Buntinas, Darius
  • 2012 IEEE 14th Int'l Conf. on High Performance Computing and Communication (HPCC) & 2012 IEEE 9th Int'l Conf. on Embedded Software and Systems (ICESS), 2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems https://doi.org/10.1109/HPCC.2012.92
conference June 2012
Ownership passing: efficient distributed memory programming on multi-core systems
  • Friedley, Andrew; Hoefler, Torsten; Bronevetsky, Greg
  • Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '13 https://doi.org/10.1145/2442516.2442534
conference January 2013
Unified Development for Mixed Multi-GPU and Multi-coprocessor Environments Using a Lightweight Runtime Environment
  • Haidar, Azzam; Cao, Chongxiao; Yarkhan, Asim
  • 2014 IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2014 IEEE 28th International Parallel and Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2014.58
conference May 2014
Hybrid MPI/OpenMP Parallel Programming on Clusters of Multi-Core SMP Nodes
  • Rabenseifner, Rolf; Hager, Georg; Jost, Gabriele
  • 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing https://doi.org/10.1109/PDP.2009.43
conference February 2009
Synchronization and communication in the T3E multiprocessor
  • Scott, Steven L.
  • Proceedings of the seventh international conference on Architectural support for programming languages and operating systems - ASPLOS-VII https://doi.org/10.1145/237090.237144
conference January 1996
Accelerator: using data parallelism to program GPUs for general-purpose uses
  • Tarditi, David; Puri, Sidd; Oglesby, Jose
  • Proceedings of the 12th international conference on Architectural support for programming languages and operating systems - ASPLOS-XII https://doi.org/10.1145/1168857.1168898
conference January 2006
MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters journal April 2011

Similar Records

Dynamic Adaptable Asynchronous Progress Model for MPI RMA Multiphase Applications
Journal Article · Tue Mar 13 00:00:00 EDT 2018 · IEEE Transactions on Parallel and Distributed Systems · OSTI ID:1261555

Improved MPI collectives for MPI processes in shared address spaces
Journal Article · Wed Mar 19 00:00:00 EDT 2014 · Cluster Computing · OSTI ID:1261555

Parallel Multivariate Spatio-Temporal Clustering of Large Ecological Datasets on Hybrid Supercomputers
Conference · Fri Sep 01 00:00:00 EDT 2017 · OSTI ID:1261555

Related Subjects