Optimizing point‐to‐point communication between adaptive MPI endpoints in shared memory

White, Sam; Kale, Laxmikant V.

doi:10.1002/cpe.4467

Optimizing point‐to‐point communication between adaptive MPI endpoints in shared memory

Journal Article · Mon Mar 12 00:00:00 EDT 2018 · Concurrency and Computation. Practice and Experience

DOI:https://doi.org/10.1002/cpe.4467· OSTI ID:1582085

^[1]; Kale, Laxmikant V. ^[1]

Department of Computer Science University of Illinois at Urbana‐Champaign IL 61801‐2302 USA

Summary

Adaptive MPI is an implementation of the MPI standard that supports the virtualization of ranks as user‐level threads, rather than OS processes. In this work, we optimize the communication performance of AMPI based on the locality of the endpoints communicating within a cluster of SMP nodes. We differentiate between point‐to‐point messages with both endpoints co‐located on the same execution unit and point‐to‐point messages with both endpoints residing in the same process but not on the same execution unit. We demonstrate how the messaging semantics of Charm++ enable and hinder AMPI's implementation in different ways, and we motivate extensions to Charm++ to address the limitations. Using the OSU micro‐benchmark suite, we show that our locality‐aware design offers lower latency, higher bandwidth, and reduced memory footprint for applications.

View Accepted Manuscript (Publisher)

Sponsoring Organization:: USDOE

Grant/Contract Number:: NA0002374

OSTI ID:: 1582085

Journal Information:: Concurrency and Computation. Practice and Experience, Journal Name: Concurrency and Computation. Practice and Experience Journal Issue: 3 Vol. 32; ISSN 1532-0626

Publisher:: Wiley Blackwell (John Wiley & Sons)Copyright Statement

Country of Publication:: United Kingdom

Language:: English

References (21)

MPC: A Unified Parallel Runtime for Clusters of NUMA Machines Pérache, Marc; Jourdren, Hervé; Namyst, Raymond Lecture Notes in Computer Science https://doi.org/10.1007/978-3-540-85451-7_9	book	January 2008
MPC-MPI: An MPI Implementation Reducing the Overall Memory Consumption Pérache, Marc; Carribault, Patrick; Jourdren, Hervé Recent Advances in Parallel Virtual Machine and Message Passing Interface https://doi.org/10.1007/978-3-642-03770-2_16	book	January 2009
Leveraging MPI’s One-Sided Communication Interface for Shared-Memory Programming Hoefler, Torsten; Dinan, James; Buntinas, Darius Recent Advances in the Message Passing Interface https://doi.org/10.1007/978-3-642-33518-1_18	book	January 2012
MPI + MPI: a new hybrid approach to parallel programming with MPI plus shared memory Hoefler, Torsten; Dinan, James; Buntinas, Darius Computing, Vol. 95, Issue 12 https://doi.org/10.1007/s00607-013-0324-2	journal	May 2013
Advanced Thread Synchronization for Multithreaded MPI Implementations Dang, Hoang-Vu; Seo, Sangmin; Amer, Abdelhalim 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) https://doi.org/10.1109/CCGRID.2017.65	conference	May 2017
Designing High Performance and Scalable MPI Intra-node Communication Support for Clusters Chai, Lei; Hartono, Albert; Panda, Dhabaleswar 2006 IEEE International Conference on Cluster Computing https://doi.org/10.1109/CLUSTR.2006.311850	conference	September 2006
Optimizing MPI communication within large multicore nodes with kernel assistance Moreaud, Stephanie; Goglin, Brice; Namyst, Raymond 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW 2010), 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW) https://doi.org/10.1109/IPDPSW.2010.5470849	conference	April 2010
Hybrid MPI/OpenMP Parallel Programming on Clusters of Multi-Core SMP Nodes Rabenseifner, Rolf; Hager, Georg; Jost, Gabriele 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing https://doi.org/10.1109/PDP.2009.43	conference	February 2009
SMARTMAP: Operating system support for efficient data sharing among processes on a multi-core processor Brightwell, R.; Pedretti, K.; Hudson, T. 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2008.5218881	conference	November 2008
Parallel Programming with Migratable Objects: Charm++ in Practice Acun, Bilge; Gupta, Abhishek; Jain, Nikhil SC14: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2014.58	conference	November 2014
Evaluating HPC Networks via Simulation of Parallel Workloads Jain, Nikhil; Bhatele, Abhinav; White, Sam SC16: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2016.13	conference	November 2016
Performance evaluation of adaptive MPI Huang, Chao; Zheng, Gengbin; Kalé, Laxmikant Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '06 https://doi.org/10.1145/1122971.1122976	conference	January 2006
McMPI: a managed-code MPI library in pure C# Holmes, Daniel; Booth, Stephen Proceedings of the 20th European MPI Users' Group Meeting on - EuroMPI '13 https://doi.org/10.1145/2488551.2488572	conference	January 2013
Hybrid MPI: efficient message passing for multi-core systems Friedley, Andrew; Bronevetsky, Greg; Hoefler, Torsten Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13 https://doi.org/10.1145/2503210.2503294	conference	January 2013
Benefits of Cross Memory Attach for MPI libraries on HPC Clusters Vienne, Jerome Proceedings of the 2014 Annual Conference on Extreme Science and Engineering Discovery Environment - XSEDE '14 https://doi.org/10.1145/2616498.2616532	conference	January 2014
Eliminating Costs for Crossing Process Boundary from MPI Intra-node Communication Shimada, Akio; Hori, Atsushi; Ishikawa, Yutaka Proceedings of the 21st European MPI Users' Group Meeting on - EuroMPI/ASIA '14 https://doi.org/10.1145/2642769.2642790	conference	January 2014
MPI+Threads: runtime contention and remedies Amer, Abdelhalim; Lu, Huiwei; Wei, Yanjie ACM SIGPLAN Notices, Vol. 50, Issue 8 https://doi.org/10.1145/2858788.2688522	journal	January 2015
Introducing Task-Containers as an Alternative to Runtime-Stacking Besnard, Jean-Baptiste; Adam, Julien; Shende, Sameer Proceedings of the 23rd European MPI Users' Group Meeting on - EuroMPI 2016 https://doi.org/10.1145/2966884.2966910	conference	January 2016
Towards millions of communicating threads Dang, Hoang-Vu; Snir, Marc; Gropp, William Proceedings of the 23rd European MPI Users' Group Meeting on - EuroMPI 2016 https://doi.org/10.1145/2966884.2966914	conference	January 2016
Enhanced memory management for scalable MPI intra-node communication on many-core processor Cho, Joong-Yeon; Jin, Hyun-Wook; Nam, Dukyun Proceedings of the 24th European MPI Users' Group Meeting on - EuroMPI '17 https://doi.org/10.1145/3127024.3127035	conference	January 2017
Enabling communication concurrency through flexible MPI endpoints Dinan, James; Grant, Ryan E.; Balaji, Pavan The International Journal of High Performance Computing Applications, Vol. 28, Issue 4 https://doi.org/10.1177/1094342014548772	journal	September 2014

Similar Records

Enabling communication concurrency through flexible MPI endpoints

Journal Article · Tue Sep 23 00:00:00 EDT 2014 · International Journal of High Performance Computing Applications · OSTI ID:1392394

Enabling communication concurrency through flexible MPI endpoints

Journal Article · Mon Sep 22 20:00:00 EDT 2014 · International Journal of High Performance Computing Applications · OSTI ID:1140752

Optimizing point‐to‐point communication between adaptive MPI endpoints in shared memory

Citation Formats

References (21)

Similar Records

Related Subjects