skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: On Analytics of File Transfer Rates over Dedicated Wide-Area Connections

Abstract

File transfers between the decentralized storage sites over dedicated wide-area connections are becoming increasingly important in high-performance computing and big data scenarios. Designing such scientific workflows for large file transfers is extremely challenging as they depend on the file, I/O, host, and local- and wide-area network subsystems, and their interactions. To gain insights into file-transfer rate profiles, we develop polynomial, bagging, and boosting regression models for Lustre and XFS file transfer measurements, which are collected using XDD over a suite of 10 Gbps connections with 0-366 ms round trip times (RTTs). In addition to overall trends and analytics, these regressions also provide file-transfer rate estimates for RTTs and number of parallel flows at which measurements might not have been collected. They show that bagging and boosting techniques provide closer data fits than the polynomial regression. We develop probabilistic bounds on the generalization error of these methods, which combined with the cross-validation error establish that former two are more accurate estimators than the polynomial regression. In addition, we present a method to efficiently determine the number of parallel flows to achieve a peak file-transfer rate using fewer than full sweep measurements; in our measurements, the peak is achieved in 96% ofmore » cases with 15-25% of measurements of a full sweep.« less

Authors:
ORCiD logo [1]; ORCiD logo [1]; ORCiD logo [1]; ORCiD logo [1];  [2];  [2]
  1. ORNL
  2. Argonne National Laboratory (ANL)
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1435302
DOE Contract Number:  
AC05-00OR22725
Resource Type:
Conference
Resource Relation:
Conference: First International Workshop on Workflow Science (WoWS) - AUCKLAND, , New Zealand - 10/24/2017 4:00:00 AM-10/27/2017 4:00:00 AM
Country of Publication:
United States
Language:
English

Citation Formats

Sen, Satyabrata, Rao, Nageswara S., Liu, Qiang, Imam, Neena, Kettimuthu, R., and Foster, I. On Analytics of File Transfer Rates over Dedicated Wide-Area Connections. United States: N. p., 2017. Web. doi:10.1109/eScience.2017.93.
Sen, Satyabrata, Rao, Nageswara S., Liu, Qiang, Imam, Neena, Kettimuthu, R., & Foster, I. On Analytics of File Transfer Rates over Dedicated Wide-Area Connections. United States. https://doi.org/10.1109/eScience.2017.93
Sen, Satyabrata, Rao, Nageswara S., Liu, Qiang, Imam, Neena, Kettimuthu, R., and Foster, I. Sun . "On Analytics of File Transfer Rates over Dedicated Wide-Area Connections". United States. https://doi.org/10.1109/eScience.2017.93. https://www.osti.gov/servlets/purl/1435302.
@article{osti_1435302,
title = {On Analytics of File Transfer Rates over Dedicated Wide-Area Connections},
author = {Sen, Satyabrata and Rao, Nageswara S. and Liu, Qiang and Imam, Neena and Kettimuthu, R. and Foster, I.},
abstractNote = {File transfers between the decentralized storage sites over dedicated wide-area connections are becoming increasingly important in high-performance computing and big data scenarios. Designing such scientific workflows for large file transfers is extremely challenging as they depend on the file, I/O, host, and local- and wide-area network subsystems, and their interactions. To gain insights into file-transfer rate profiles, we develop polynomial, bagging, and boosting regression models for Lustre and XFS file transfer measurements, which are collected using XDD over a suite of 10 Gbps connections with 0-366 ms round trip times (RTTs). In addition to overall trends and analytics, these regressions also provide file-transfer rate estimates for RTTs and number of parallel flows at which measurements might not have been collected. They show that bagging and boosting techniques provide closer data fits than the polynomial regression. We develop probabilistic bounds on the generalization error of these methods, which combined with the cross-validation error establish that former two are more accurate estimators than the polynomial regression. In addition, we present a method to efficiently determine the number of parallel flows to achieve a peak file-transfer rate using fewer than full sweep measurements; in our measurements, the peak is achieved in 96% of cases with 15-25% of measurements of a full sweep.},
doi = {10.1109/eScience.2017.93},
url = {https://www.osti.gov/biblio/1435302}, journal = {},
number = ,
volume = ,
place = {United States},
year = {2017},
month = {10}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: