skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Machine Learning Methods for Connection RTT and Loss Rate Estimation Using MPI Measurements Under Random Losses

Abstract

Scientific computations are expected to be increasingly distributed across wide-area networks, and Message Passing Interface (MPI) has been shown to scale to support their communications over long distances. Application-level measurements of MPI operations reflect the connection Round-Trip Time (RTT) and loss rate, and machine learning methods have been previously developed to estimate them under deterministic periodic losses. In this paper, we consider more complex, random losses with unform, Poisson and Gaussian distributions. We study five disparate machine leaning methods, with linear and non-linear, and smooth and non-smooth properties, to estimate RTT and loss rate over 10Gbps connections with 0-366ms RTT. The diversity and complexity of these estimators combined with the randomness of losses and TCP’s non-linear response together rule out the selection of a single best among them; instead, we fuse them to retain their design diversity. Overall, the results show that accurate estimates can be generated at low loss rates but become inaccurate at loss rates 10% and higher, thereby illustrating both their strengths and limitations.

Authors:
; ; ; ;
Publication Date:
Research Org.:
Argonne National Lab. (ANL), Argonne, IL (United States)
Sponsoring Org.:
U.S. Department of Defense (DOD); USDOE Office of Science - Office of Advanced Scientific Computing Research
OSTI Identifier:
1668341
DOE Contract Number:  
AC02-06CH11357
Resource Type:
Conference
Resource Relation:
Conference: 2nd International Conference on Machine Learning for Networking, 12/03/19 - 12/05/19, Paris, FR
Country of Publication:
United States
Language:
English
Subject:
Generalization Bounds; Information Fusion; Loss Rate; Machine Learning; Message Passing Interface; Regression; Round Trip Time

Citation Formats

Rao, Nageswara S. V., Imam, Neena, Liu, Zhengchun, Kettimuthu, Rajkumar, and Foster, Ian. Machine Learning Methods for Connection RTT and Loss Rate Estimation Using MPI Measurements Under Random Losses. United States: N. p., 2020. Web. doi:10.1007/978-3-030-45778-5_11.
Rao, Nageswara S. V., Imam, Neena, Liu, Zhengchun, Kettimuthu, Rajkumar, & Foster, Ian. Machine Learning Methods for Connection RTT and Loss Rate Estimation Using MPI Measurements Under Random Losses. United States. https://doi.org/10.1007/978-3-030-45778-5_11
Rao, Nageswara S. V., Imam, Neena, Liu, Zhengchun, Kettimuthu, Rajkumar, and Foster, Ian. Wed . "Machine Learning Methods for Connection RTT and Loss Rate Estimation Using MPI Measurements Under Random Losses". United States. https://doi.org/10.1007/978-3-030-45778-5_11.
@article{osti_1668341,
title = {Machine Learning Methods for Connection RTT and Loss Rate Estimation Using MPI Measurements Under Random Losses},
author = {Rao, Nageswara S. V. and Imam, Neena and Liu, Zhengchun and Kettimuthu, Rajkumar and Foster, Ian},
abstractNote = {Scientific computations are expected to be increasingly distributed across wide-area networks, and Message Passing Interface (MPI) has been shown to scale to support their communications over long distances. Application-level measurements of MPI operations reflect the connection Round-Trip Time (RTT) and loss rate, and machine learning methods have been previously developed to estimate them under deterministic periodic losses. In this paper, we consider more complex, random losses with unform, Poisson and Gaussian distributions. We study five disparate machine leaning methods, with linear and non-linear, and smooth and non-smooth properties, to estimate RTT and loss rate over 10Gbps connections with 0-366ms RTT. The diversity and complexity of these estimators combined with the randomness of losses and TCP’s non-linear response together rule out the selection of a single best among them; instead, we fuse them to retain their design diversity. Overall, the results show that accurate estimates can be generated at low loss rates but become inaccurate at loss rates 10% and higher, thereby illustrating both their strengths and limitations.},
doi = {10.1007/978-3-030-45778-5_11},
url = {https://www.osti.gov/biblio/1668341}, journal = {},
number = ,
volume = ,
place = {United States},
year = {2020},
month = {1}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share:

Works referenced in this record:

Random Forests
journal, January 2001


Flowzilla: A Methodology for Detecting Data Transfer Anomalies in Research Networks
conference, November 2018


TCP/IP performance with random loss and bidirectional congestion
journal, January 2000


On fusers that perform better than best sensor
journal, January 2001


A Case Study of MPI Over Long Distance Connections
conference, April 2019


Estimation of RTT and Loss Rate of Wide-Area Connections Using MPI Measurements
conference, November 2019