Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Rendezvous algorithms for large-scale modeling and simulation

Journal Article · · Journal of Parallel and Distributed Computing
 [1];  [2]
  1. Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
  2. Argonne National Lab. (ANL), Lemont, IL (United States)

Rendezvous algorithms encode a communication pattern that is useful when processors sending data do not know who the receiving processors should be, or vice versa. The idea is to define an intermediate decomposition where datums from different sending processors can ”rendezvous” to perform a computation, in a manner that both the senders and eventual receivers of the results can identify the appropriate rendezvous processor. Though they were originally designed for interpolating between overlaid grids with independent parallel decompositions (Plimpton et al., 2004), we have recently found rendezvous algorithms useful for a variety of operations in particle- or grid-based simulation codes when running large problems on large numbers of processors. In particular, we show they can perform well when a load-balanced intermediate decomposition is randomized and not spatial, requiring all-to-all communication to move data between processors. In this case rendezvous algorithms leverage the large bisection communication bandwidths which parallel machines provide. We describe how rendezvous algorithms work in a scientific computing context and give specific examples for molecular dynamics and Direct Simulation Monte Carlo codes which result in dramatic performance improvements versus simpler algorithms which do not scale as well. We explain how a generic rendezvous algorithm can be implemented, and also point out similarities with the MapReduce paradigm popularized by Google and Hadoop.

Research Organization:
Sandia National Laboratories (SNL-NM), Albuquerque, NM (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA)
Grant/Contract Number:
AC04-94AL85000; NA0003525; AC02-06CH11357
OSTI ID:
1668692
Alternate ID(s):
OSTI ID: 2325459; OSTI ID: 1776628
Report Number(s):
SAND--2020-4542J; {685787,"Journal ID: ISSN 0743-7315"}
Journal Information:
Journal of Parallel and Distributed Computing, Journal Name: Journal of Parallel and Distributed Computing Vol. 147; ISSN 0743-7315
Publisher:
ElsevierCopyright Statement
Country of Publication:
United States
Language:
English