Merged Requests for Better Performance and Productivity in Multithreaded OpenSHMEM
- ORNL
A merged request is a handle representing a group of Remote Memory Access (RMA), Atomic or Collective operations. The merged request can be created either by combining multiple outstanding merged request handles or using the same merged request handle for additional operations. We show that introducing such simple yet powerful semantics in OpenSHMEM provides many productivity and performance advantages. In this paper, we first introduce the interfaces and semantics for creating and using merged request handles. Then, we demonstrate with a merge request that we can achieve better performance characteristics in multithreaded OpenSHMEM application. Particularly, we show one can achieve higher message rate, a higher bandwidth for smaller message, and better computation-communication overlap. Further, we use merged request to realize multithreaded collectives, where multiple threads co-operate to complete the collective operation. Our experimental results show that in a multithreaded OpenSHMEM program, the merged request based RMA operations achieve over 100 Million Messages Per Second (MMPS). It achieves over 10 MMPS compared to 4.5 MMPS with default RMA operations in a single threaded environment. Also, we achieve higher bandwidth for smaller message sizes, close to 100% overlap, and reduce the latency by 60%.
- Research Organization:
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- AC05-00OR22725
- OSTI ID:
- 1399981
- Resource Relation:
- Conference: OpenSHMEM 2017: Fourth workshop on OpenSHMEM and Related Technologies - Annapolis, Maryland, United States of America - 8/7/2017 8:00:00 AM-8/9/2017 8:00:00 AM
- Country of Publication:
- United States
- Language:
- English
Similar Records
Designing a High Performance OpenSHMEM Implementation using Universal Common Communication Substrate as a Communication Middleware
GPU-Centric Communication on NVIDIA GPU Clusters with InfiniBand: A Case Study with OpenSHMEM