skip to main content

Title: Reducing the bulk of the bulk synchronous parallel model

For over two decades the dominant means for enabling portable performance of computational science and engineering applications on parallel processing architectures has been the bulk-synchronous parallel programming (BSP) model. Code developers, motivated by performance considerations to minimize the number of messages transmitted, have typically pursued a strategy of aggregating message data into fewer, larger messages. Emerging and future high-performance architectures, especially those seen as targeting Exascale capabilities, provide motivation and capabilities for revisiting this approach. In this paper we explore alternative configurations within the context of a large-scale complex multi-physics application and a proxy that represents its behavior, presenting results that demonstrate some important advantages as the number of processors increases in scale.
Authors:
; ; ;
Publication Date:
OSTI Identifier:
1095835
Report Number(s):
SAND2013-8579J
Journal ID: ISSN 0129-6264; 476654
DOE Contract Number:
AC04-94AL85000
Resource Type:
Journal Article
Resource Relation:
Journal Name: Parallel Processing Letters; Journal Volume: 23; Journal Issue: 04; Related Information: Proposed for publication in Parallel Processing Letters.
Publisher:
World Scientific
Research Org:
Sandia National Laboratories (SNL-NM), Albuquerque, NM (United States)
Sponsoring Org:
USDOE National Nuclear Security Administration (NNSA)
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING BSP; MPI; contention; message aggregation