Reducing the bulk of the bulk synchronous parallel model
For over two decades the dominant means for enabling portable performance of computational science and engineering applications on parallel processing architectures has been the bulk-synchronous parallel programming (BSP) model. Code developers, motivated by performance considerations to minimize the number of messages transmitted, have typically pursued a strategy of aggregating message data into fewer, larger messages. Emerging and future high-performance architectures, especially those seen as targeting Exascale capabilities, provide motivation and capabilities for revisiting this approach. In this paper we explore alternative configurations within the context of a large-scale complex multi-physics application and a proxy that represents its behavior, presenting results that demonstrate some important advantages as the number of processors increases in scale.
- Research Organization:
- Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
- Sponsoring Organization:
- USDOE National Nuclear Security Administration (NNSA)
- DOE Contract Number:
- AC04-94AL85000
- OSTI ID:
- 1095835
- Report Number(s):
- SAND2013-8579J; 476654
- Journal Information:
- Parallel Processing Letters, Vol. 23, Issue 04; Related Information: Proposed for publication in Parallel Processing Letters.; ISSN 0129-6264
- Publisher:
- World Scientific
- Country of Publication:
- United States
- Language:
- English
Similar Records
Compiled MPI: Cost-Effective Exascale Applications Development
HPC-Colony: Services and Interfaces to Aupport Systems With Very Large Numbers of Processors