A Case Study of MPI Over Long Distance Connections
Abstract
Scientific workflows are increasingly being distributed across wide-area networks, and their code executions are expected to span across geographically dispersed computing systems. MPI has been extensively used to support communications for distributed computations, typically, over compute clusters and high-performance systems within a single facility. We present a case study of performance of MPI basic operations over long distance connections, wherein TCP is used for the underlying transport. We present measurements of execution times of MPI codes that utilize MPI Sendrecv operations over emulated 10Gbps connections with 0-366ms round-trip times, including the longest one spanning the globe. They demonstrate that basic MPI codes can be sustained over long distance connections under external packet loss rates up to 10%. They also highlight the qualitative effects of losses which manifest as increased execution times as a consequence of TCP’s loss recovery process.
- Authors:
-
- ORNL
- Publication Date:
- Research Org.:
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
- Sponsoring Org.:
- USDOE National Nuclear Security Administration (NNSA); USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)
- OSTI Identifier:
- 1559642
- DOE Contract Number:
- AC05-00OR22725
- Resource Type:
- Conference
- Resource Relation:
- Conference: 13th Annual IEEE International Systems Conference (SysCon 2019) - Orlando, Florida, United States of America - 4/8/2019 8:00:00 AM-4/11/2019 8:00:00 AM
- Country of Publication:
- United States
- Language:
- English
Citation Formats
Rao, Nageswara S., Imam, Neena, and Boehm, Swen. A Case Study of MPI Over Long Distance Connections. United States: N. p., 2019.
Web.
Rao, Nageswara S., Imam, Neena, & Boehm, Swen. A Case Study of MPI Over Long Distance Connections. United States.
Rao, Nageswara S., Imam, Neena, and Boehm, Swen. Mon .
"A Case Study of MPI Over Long Distance Connections". United States. https://www.osti.gov/servlets/purl/1559642.
@article{osti_1559642,
title = {A Case Study of MPI Over Long Distance Connections},
author = {Rao, Nageswara S. and Imam, Neena and Boehm, Swen},
abstractNote = {Scientific workflows are increasingly being distributed across wide-area networks, and their code executions are expected to span across geographically dispersed computing systems. MPI has been extensively used to support communications for distributed computations, typically, over compute clusters and high-performance systems within a single facility. We present a case study of performance of MPI basic operations over long distance connections, wherein TCP is used for the underlying transport. We present measurements of execution times of MPI codes that utilize MPI Sendrecv operations over emulated 10Gbps connections with 0-366ms round-trip times, including the longest one spanning the globe. They demonstrate that basic MPI codes can be sustained over long distance connections under external packet loss rates up to 10%. They also highlight the qualitative effects of losses which manifest as increased execution times as a consequence of TCP’s loss recovery process.},
doi = {},
url = {https://www.osti.gov/biblio/1559642},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2019},
month = {4}
}