skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Estimation of RTT and Loss Rate of Wide-Area Connections Using MPI Measurements

Abstract

Scientific computations are expected to be increasingly distributed across wide-area networks, and the Message Passing Interface (MPI) has been shown to scale to support their communications over long distances. These computations should account for certain network parameters to ensure an effective execution, for example, by avoiding highly congested and long connections. The execution times of MPI basic operations reflect the connection parameters, including the Round Trip Time (RTT) and loss rate. We describe five machine leaning methods to estimate the connection RTT and loss rate using execution times of MPI basic operations. We utilize execution time measurements of MPI Sendrecv operations collected over emulated 10 Gbps connections with 0-366 ms round-trip times, wherein the longest connection spans the globe, under up to 20% periodic losses. These methods provide disparate, namely, linear and non-linear, and smooth and non-smooth, estimates of RTT and loss rate. Our results show that accurate estimates can be generated at low loss rates but they become inaccurate at loss rates 10% and higher. Overall, these results constitute a case study of the strengths and limitations of machine learning methods in inferring network-level parameters using application-level measurements.

Authors:
ORCiD logo [1]; ORCiD logo [1]; ORCiD logo [2];  [2];  [2]
  1. ORNL
  2. Argonne National Laboratory (ANL)
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
OSTI Identifier:
1659573
DOE Contract Number:  
AC05-00OR22725
Resource Type:
Conference
Resource Relation:
Conference: Workshop Innovating the Network for Data-Intensive Science (INDIS) - Denver, Colorado, United States of America - 11/17/2019 10:00:00 AM-11/17/2019 10:00:00 AM
Country of Publication:
United States
Language:
English

Citation Formats

Rao, Nageswara S., Imam, Neena, Liu, Zhengchun, Kettimuthu, Rajkumar, and Foster, Ian. Estimation of RTT and Loss Rate of Wide-Area Connections Using MPI Measurements. United States: N. p., 2019. Web. doi:10.1109/INDIS49552.2019.00008.
Rao, Nageswara S., Imam, Neena, Liu, Zhengchun, Kettimuthu, Rajkumar, & Foster, Ian. Estimation of RTT and Loss Rate of Wide-Area Connections Using MPI Measurements. United States. https://doi.org/10.1109/INDIS49552.2019.00008
Rao, Nageswara S., Imam, Neena, Liu, Zhengchun, Kettimuthu, Rajkumar, and Foster, Ian. Fri . "Estimation of RTT and Loss Rate of Wide-Area Connections Using MPI Measurements". United States. https://doi.org/10.1109/INDIS49552.2019.00008. https://www.osti.gov/servlets/purl/1659573.
@article{osti_1659573,
title = {Estimation of RTT and Loss Rate of Wide-Area Connections Using MPI Measurements},
author = {Rao, Nageswara S. and Imam, Neena and Liu, Zhengchun and Kettimuthu, Rajkumar and Foster, Ian},
abstractNote = {Scientific computations are expected to be increasingly distributed across wide-area networks, and the Message Passing Interface (MPI) has been shown to scale to support their communications over long distances. These computations should account for certain network parameters to ensure an effective execution, for example, by avoiding highly congested and long connections. The execution times of MPI basic operations reflect the connection parameters, including the Round Trip Time (RTT) and loss rate. We describe five machine leaning methods to estimate the connection RTT and loss rate using execution times of MPI basic operations. We utilize execution time measurements of MPI Sendrecv operations collected over emulated 10 Gbps connections with 0-366 ms round-trip times, wherein the longest connection spans the globe, under up to 20% periodic losses. These methods provide disparate, namely, linear and non-linear, and smooth and non-smooth, estimates of RTT and loss rate. Our results show that accurate estimates can be generated at low loss rates but they become inaccurate at loss rates 10% and higher. Overall, these results constitute a case study of the strengths and limitations of machine learning methods in inferring network-level parameters using application-level measurements.},
doi = {10.1109/INDIS49552.2019.00008},
url = {https://www.osti.gov/biblio/1659573}, journal = {},
number = ,
volume = ,
place = {United States},
year = {2019},
month = {11}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: