skip to main content

DOE PAGESDOE PAGES

Title: Optimizing End-to-End Big Data Transfers over Terabits Network Infrastructure

While future terabit networks hold the promise of significantly improving big-data motion among geographically distributed data centers, significant challenges must be overcome even on today's 100 gigabit networks to realize end-to-end performance. Multiple bottlenecks exist along the end-to-end path from source to sink, for instance, the data storage infrastructure at both the source and sink and its interplay with the wide-area network are increasingly the bottleneck to achieving high performance. In this study, we identify the issues that lead to congestion on the path of an end-to-end data transfer in the terabit network environment, and we present a new bulk data movement framework for terabit networks, called LADS. LADS exploits the underlying storage layout at each endpoint to maximize throughput without negatively impacting the performance of shared storage resources for other users. LADS also uses the Common Communication Interface (CCI) in lieu of the sockets interface to benefit from hardware-level zero-copy, and operating system bypass capabilities when available. It can further improve data transfer performance under congestion on the end systems using buffering at the source using flash storage. With our evaluations, we show that LADS can avoid congested storage elements within the shared storage resource, improving input/output bandwidth, andmore » data transfer rates across the high speed networks. We also investigate the performance degradation problems of LADS due to I/O contention on the parallel file system (PFS), when multiple LADS tools share the PFS. We design and evaluate a meta-scheduler to coordinate multiple I/O streams while sharing the PFS, to minimize the I/O contention on the PFS. Finally, with our evaluations, we observe that LADS with meta-scheduling can further improve the performance by up to 14 percent relative to LADS without meta-scheduling.« less
Authors:
ORCiD logo [1] ;  [2] ;  [2] ;  [2] ; ORCiD logo [3]
  1. Sogang Univ., Seoul (Korea, Republic of). Dept. of Computer Science and Engineering
  2. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
  3. Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
Publication Date:
Report Number(s):
LA-UR-17-27410
Journal ID: ISSN 1045-9219; KJ0502000; KJ0404000; ERKJZN1; ERKJM05; TRN: US1702181
Grant/Contract Number:
AC05-00OR22725; R0190-15-2012; 2015R1C1A1A0152105; AC52-06NA25396
Type:
Accepted Manuscript
Journal Name:
IEEE Transactions on Parallel and Distributed Systems
Additional Journal Information:
Journal Volume: 28; Journal Issue: 1; Journal ID: ISSN 1045-9219
Publisher:
IEEE
Research Org:
Sogang Univ., Seoul (Korea, Republic of); Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
Sponsoring Org:
USDOE; Ministry of Science, ICT and Future Planning (MSIP) of Korea; National Research Foundation of Korea (NRF); USDOE Office of Science (SC). Advanced Scientific Computing Research (ASCR) (SC-21)
Contributing Orgs:
Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; Data transfer; Servers; Throughput; Data models; Instruction sets; Big data; Computational modeling; File and storage systems; Parallel file systems; networks; I/O scheduling
OSTI Identifier:
1361284
Alternate Identifier(s):
OSTI ID: 1407899

Kim, Youngjae, Atchley, Scott, Vallee, Geoffroy R., Lee, Sangkeun, and Shipman, Galen M.. Optimizing End-to-End Big Data Transfers over Terabits Network Infrastructure. United States: N. p., Web. doi:10.1109/TPDS.2016.2550439.
Kim, Youngjae, Atchley, Scott, Vallee, Geoffroy R., Lee, Sangkeun, & Shipman, Galen M.. Optimizing End-to-End Big Data Transfers over Terabits Network Infrastructure. United States. doi:10.1109/TPDS.2016.2550439.
Kim, Youngjae, Atchley, Scott, Vallee, Geoffroy R., Lee, Sangkeun, and Shipman, Galen M.. 2016. "Optimizing End-to-End Big Data Transfers over Terabits Network Infrastructure". United States. doi:10.1109/TPDS.2016.2550439. https://www.osti.gov/servlets/purl/1361284.
@article{osti_1361284,
title = {Optimizing End-to-End Big Data Transfers over Terabits Network Infrastructure},
author = {Kim, Youngjae and Atchley, Scott and Vallee, Geoffroy R. and Lee, Sangkeun and Shipman, Galen M.},
abstractNote = {While future terabit networks hold the promise of significantly improving big-data motion among geographically distributed data centers, significant challenges must be overcome even on today's 100 gigabit networks to realize end-to-end performance. Multiple bottlenecks exist along the end-to-end path from source to sink, for instance, the data storage infrastructure at both the source and sink and its interplay with the wide-area network are increasingly the bottleneck to achieving high performance. In this study, we identify the issues that lead to congestion on the path of an end-to-end data transfer in the terabit network environment, and we present a new bulk data movement framework for terabit networks, called LADS. LADS exploits the underlying storage layout at each endpoint to maximize throughput without negatively impacting the performance of shared storage resources for other users. LADS also uses the Common Communication Interface (CCI) in lieu of the sockets interface to benefit from hardware-level zero-copy, and operating system bypass capabilities when available. It can further improve data transfer performance under congestion on the end systems using buffering at the source using flash storage. With our evaluations, we show that LADS can avoid congested storage elements within the shared storage resource, improving input/output bandwidth, and data transfer rates across the high speed networks. We also investigate the performance degradation problems of LADS due to I/O contention on the parallel file system (PFS), when multiple LADS tools share the PFS. We design and evaluate a meta-scheduler to coordinate multiple I/O streams while sharing the PFS, to minimize the I/O contention on the PFS. Finally, with our evaluations, we observe that LADS with meta-scheduling can further improve the performance by up to 14 percent relative to LADS without meta-scheduling.},
doi = {10.1109/TPDS.2016.2550439},
journal = {IEEE Transactions on Parallel and Distributed Systems},
number = 1,
volume = 28,
place = {United States},
year = {2016},
month = {4}
}