IMPACC: A Tightly Integrated MPI+OpenACC Framework Exploiting Shared Memory Parallelism

Kim, Jungwon; Lee, Seyong; Vetter, Jeffrey S.

doi:10.1145/2907294.2907302

Title: IMPACC: A Tightly Integrated MPI+OpenACC Framework Exploiting Shared Memory Parallelism

Conference · Sun May 01 00:00:00 EDT 2016

DOI:https://doi.org/10.1145/2907294.2907302· OSTI ID:1261555

^[1];

^[1]

ORNL

We propose IMPACC, an MPI+OpenACC framework for heterogeneous accelerator clusters. IMPACC tightly integrates MPI and OpenACC, while exploiting the shared memory parallelism in the target system. IMPACC dynamically adapts the input MPI+OpenACC applications on the target heterogeneous accelerator clusters to fully exploit target system-specific features. IMPACC provides the programmers with the unified virtual address space, automatic NUMA-friendly task-device mapping, efficient integrated communication routines, seamless streamlining of asynchronous executions, and transparent memory sharing. We have implemented IMPACC and evaluated its performance using three heterogeneous accelerator systems, including Titan supercomputer. Results show that IMPACC can achieve easier programming, higher performance, and better scalability than the current MPI+OpenACC model.

OSTI does not have a digital full text copy available. For more information, please see document availability, search WorldCat, or search Google Scholar.

Cite

Export

Save

Research Organization:: Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)

Sponsoring Organization:: USDOE Office of Science (SC)

DOE Contract Number:: AC05-00OR22725

OSTI ID:: 1261555

Resource Relation:: Conference: ACM International Symposium on High-Performance Parallel and Distributed Computing - Kyoto, , Japan - 5/31/2016 12:00:00 AM-6/4/2016 12:00:00 AM

Country of Publication:: United States

Language:: English

References (36)

Hybrid MPI: efficient message passing for multi-core systems Friedley, Andrew; Bronevetsky, Greg; Hoefler, Torsten Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13 https://doi.org/10.1145/2503210.2503294	conference	January 2013
Cashmere: Heterogeneous Many-Core Computing Hijma, Pieter; Jacobs, Ceriel J. H.; Nieuwpoort, Rob V. van 2015 IEEE International Parallel and Distributed Processing Symposium (IPDPS) https://doi.org/10.1109/IPDPS.2015.38	conference	May 2015
Enabling MPI interoperability through flexible communication endpoints Dinan, James; Balaji, Pavan; Goodell, David Proceedings of the 20th European MPI Users' Group Meeting on - EuroMPI '13 https://doi.org/10.1145/2488551.2488553	conference	January 2013
Fine-Grained Multithreading Support for Hybrid Threaded MPI Programming Balaji, Pavan; Buntinas, Darius; Goodell, David The International Journal of High Performance Computing Applications, Vol. 24, Issue 1 https://doi.org/10.1177/1094342009360206	journal	February 2010
Enabling CUDA acceleration within virtual machines using rCUDA Duato, Jose; Pena, Antonio J.; Silla, Federico 2011 18th International Conference on High Performance Computing (HiPC) https://doi.org/10.1109/HiPC.2011.6152718	conference	December 2011
KNEM: A generic and scalable kernel-assisted intra-node MPI communication framework Goglin, Brice; Moreaud, Stéphanie Journal of Parallel and Distributed Computing, Vol. 73, Issue 2 https://doi.org/10.1016/j.jpdc.2012.09.016	journal	February 2013
MT-MPI: multithreaded MPI for many-core environments Si, Min; Peña, Antonio J.; Balaji, Pavan Proceedings of the 28th ACM international conference on Supercomputing - ICS '14 https://doi.org/10.1145/2597652.2597658	conference	January 2014
A hybrid approach of OpenMP for clusters Kwon, Okwan; Jubair, Fahed; Eigenmann, Rudolf Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming - PPoPP '12 https://doi.org/10.1145/2145816.2145827	conference	January 2012
Performance characterization of the NAS Parallel Benchmarks in OpenCL Seo, Sangmin; Jo, Gangwon; Lee, Jaejin 2011 IEEE International Symposium on Workload Characterization (IISWC) https://doi.org/10.1109/IISWC.2011.6114174	conference	November 2011
OmpSs-OpenCL Programming Model for Heterogeneous Systems Elangovan, Vinoth Krishnan; Badia, Rosa. M.; Parra, Eduard Ayguade Languages and Compilers for Parallel Computing https://doi.org/10.1007/978-3-642-37658-0_7	book	January 2013
OpenARC: open accelerator research compiler for directive-based, efficient heterogeneous computing Lee, Seyong; Vetter, Jeffrey S. Proceedings of the 23rd international symposium on High-performance parallel and distributed computing - HPDC '14 https://doi.org/10.1145/2600212.2600704	conference	January 2014
Performance characteristics of hybrid MPI/OpenMP implementations of NAS parallel benchmarks SP and BT on large-scale multicore supercomputers Wu, Xingfu; Taylor, Valerie ACM SIGMETRICS Performance Evaluation Review, Vol. 38, Issue 4 https://doi.org/10.1145/1964218.1964228	journal	March 2011
Early evaluation of directive-based GPU programming models for productive exascale computing Lee, Seyong; Vetter, Jeffrey S. 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2012.51	conference	November 2012
An Extension of the StarSs Programming Model for Platforms with Multiple GPUs Ayguadé, Eduard; Badia, Rosa M.; Igual, Francisco D. Lecture Notes in Computer Science https://doi.org/10.1007/978-3-642-03869-3_79	book	January 2009
Productive Programming of GPU Clusters with OmpSs Bueno, Javier; Planas, Judit; Duran, Alejandro 2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2012 IEEE 26th International Parallel and Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2012.58	conference	May 2012
MPI versus MPI+OpenMP on the IBM SP for the NAS Benchmarks Cappello, F.; Etiemble, D. ACM/IEEE SC 2000 Conference (SC'00) https://doi.org/10.1109/SC.2000.10001	conference	January 2000
Performance comparison of MPI and three openMP programming styles on shared memory multiprocessors Krawezik, Géraud Proceedings of the fifteenth annual ACM symposium on Parallel algorithms and architectures - SPAA '03 https://doi.org/10.1145/777412.777433	conference	January 2003
StarPU-MPI: Task Programming over Clusters of Machines Enhanced with Accelerators Augonnet, Cédric; Aumage, Olivier; Furmento, Nathalie Recent Advances in the Message Passing Interface https://doi.org/10.1007/978-3-642-33518-1_40	book	January 2012
Portable performance on heterogeneous architectures Phothilimthana, Phitchaya Mangpo; Ansel, Jason; Ragan-Kelley, Jonathan Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems - ASPLOS '13 https://doi.org/10.1145/2451116.2451162	conference	January 2013
MVAPICH-PRISM: a proxy-based communication framework using InfiniBand and SCIF for intel MIC clusters Potluri, Sreeram; Bureddy, Devendar; Hamidouche, Khaled Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13 https://doi.org/10.1145/2503210.2503288	conference	January 2013
CellSs: a Programming Model for the Cell BE Architecture Bellens, Pieter; Perez, Josep; Badia, Rosa ACM/IEEE SC 2006 Conference (SC'06) https://doi.org/10.1109/SC.2006.17	conference	November 2006
SnuCL: an OpenCL framework for heterogeneous CPU/GPU clusters Kim, Jungwon; Seo, Sangmin; Lee, Jun Proceedings of the 26th ACM international conference on Supercomputing - ICS '12 https://doi.org/10.1145/2304576.2304623	conference	January 2012
The NAS parallel benchmarks---summary and preliminary results Bailey, D. H.; Schreiber, R. S.; Simon, H. D. Proceedings of the 1991 ACM/IEEE conference on Supercomputing - Supercomputing '91 https://doi.org/10.1145/125826.125925	conference	January 1991
An asymmetric distributed shared memory model for heterogeneous parallel systems Gelado, Isaac; Cabezas, Javier; Navarro, Nacho Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems - ASPLOS '10 https://doi.org/10.1145/1736020.1736059	conference	January 2010
Compile/run-time support for threaded MPI execution on multiprogrammed shared memory machines Tang, Hong; Shen, Kai; Yang, Tao ACM SIGPLAN Notices, Vol. 34, Issue 8 https://doi.org/10.1145/329366.301114	journal	August 1999
LibWater: heterogeneous distributed computing made easy Grasso, Ivan; Pellegrini, Simone; Cosenza, Biagio Proceedings of the 27th international ACM conference on International conference on supercomputing - ICS '13 https://doi.org/10.1145/2464996.2465008	conference	January 2013
Beacon: Exploring the Deployment and Application of Intel Xeon Phi Coprocessors for Scientific Computing Brook, R. Glenn; Heinecke, Alexander; Costa, Anthony B. Computing in Science & Engineering, Vol. 17, Issue 2 https://doi.org/10.1109/MCSE.2014.113	journal	March 2015
Performance Implications of Nonuniform Device Topologies in Scalable Heterogeneous Architectures Meredith, Jeremy; Roth, Philip; Spafford, Kyle IEEE Micro, Vol. 31, Issue 5 https://doi.org/10.1109/MM.2011.79	journal	September 2011
A Unified Programming Model for Intra- and Inter-Node Offloading on Xeon Phi Clusters Noack, Matthias; Wende, Florian; Steinke, Thomas SC14: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2014.22	conference	November 2014
MPI-ACC: An Integrated and Extensible Approach to Data Movement in Accelerator-based Systems Aji, Ashwin M.; Dinan, James; Buntinas, Darius 2012 IEEE 14th Int'l Conf. on High Performance Computing and Communication (HPCC) & 2012 IEEE 9th Int'l Conf. on Embedded Software and Systems (ICESS), 2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems https://doi.org/10.1109/HPCC.2012.92	conference	June 2012
Ownership passing: efficient distributed memory programming on multi-core systems Friedley, Andrew; Hoefler, Torsten; Bronevetsky, Greg Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '13 https://doi.org/10.1145/2442516.2442534	conference	January 2013
Unified Development for Mixed Multi-GPU and Multi-coprocessor Environments Using a Lightweight Runtime Environment Haidar, Azzam; Cao, Chongxiao; Yarkhan, Asim 2014 IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2014 IEEE 28th International Parallel and Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2014.58	conference	May 2014
Hybrid MPI/OpenMP Parallel Programming on Clusters of Multi-Core SMP Nodes Rabenseifner, Rolf; Hager, Georg; Jost, Gabriele 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing https://doi.org/10.1109/PDP.2009.43	conference	February 2009
Synchronization and communication in the T3E multiprocessor Scott, Steven L. Proceedings of the seventh international conference on Architectural support for programming languages and operating systems - ASPLOS-VII https://doi.org/10.1145/237090.237144	conference	January 1996
Accelerator: using data parallelism to program GPUs for general-purpose uses Tarditi, David; Puri, Sidd; Oglesby, Jose Proceedings of the 12th international conference on Architectural support for programming languages and operating systems - ASPLOS-XII https://doi.org/10.1145/1168857.1168898	conference	January 2006
MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters Wang, Hao; Potluri, Sreeram; Luo, Miao Computer Science - Research and Development, Vol. 26, Issue 3-4 https://doi.org/10.1007/s00450-011-0171-3	journal	April 2011

Similar Records

Dynamic Adaptable Asynchronous Progress Model for MPI RMA Multiphase Applications

Journal Article · Tue Mar 13 00:00:00 EDT 2018 · IEEE Transactions on Parallel and Distributed Systems · OSTI ID:1261555

Si, Min; Pena, Antonio J.; Hammond, Jeff; +3 more

Improved MPI collectives for MPI processes in shared address spaces

Journal Article · Wed Mar 19 00:00:00 EDT 2014 · Cluster Computing · OSTI ID:1261555

Li, Shigang; Hoefler, Torsten; Hu, Chungjin; +1 more

Parallel Multivariate Spatio-Temporal Clustering of Large Ecological Datasets on Hybrid Supercomputers

Conference · Fri Sep 01 00:00:00 EDT 2017 · OSTI ID:1261555

Sreepathi, Sarat; Kumar, Jitendra; Mills, Richard T.; +3 more

Title: IMPACC: A Tightly Integrated MPI+OpenACC Framework Exploiting Shared Memory Parallelism

Citation Formats

References (36)

Similar Records

Related Subjects