skip to main content


Title: Optimizing End-to-End Big Data Transfers over Terabits Network Infrastructure

While future terabit networks hold the promise of significantly improving big-data motion among geographically distributed data centers, significant challenges must be overcome even on today's 100 gigabit networks to realize end-to-end performance. Multiple bottlenecks exist along the end-to-end path from source to sink, for instance, the data storage infrastructure at both the source and sink and its interplay with the wide-area network are increasingly the bottleneck to achieving high performance. In this study, we identify the issues that lead to congestion on the path of an end-to-end data transfer in the terabit network environment, and we present a new bulk data movement framework for terabit networks, called LADS. LADS exploits the underlying storage layout at each endpoint to maximize throughput without negatively impacting the performance of shared storage resources for other users. LADS also uses the Common Communication Interface (CCI) in lieu of the sockets interface to benefit from hardware-level zero-copy, and operating system bypass capabilities when available. It can further improve data transfer performance under congestion on the end systems using buffering at the source using flash storage. With our evaluations, we show that LADS can avoid congested storage elements within the shared storage resource, improving input/output bandwidth, andmore » data transfer rates across the high speed networks. We also investigate the performance degradation problems of LADS due to I/O contention on the parallel file system (PFS), when multiple LADS tools share the PFS. We design and evaluate a meta-scheduler to coordinate multiple I/O streams while sharing the PFS, to minimize the I/O contention on the PFS. Finally, with our evaluations, we observe that LADS with meta-scheduling can further improve the performance by up to 14 percent relative to LADS without meta-scheduling.« less
ORCiD logo [1] ;  [2] ;  [2] ;  [2] ; ORCiD logo [3]
  1. Sogang Univ., Seoul (Korea, Republic of). Dept. of Computer Science and Engineering
  2. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
  3. Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
Publication Date:
Report Number(s):
Journal ID: ISSN 1045-9219; KJ0502000; KJ0404000; ERKJZN1; ERKJM05; TRN: US1702181
Grant/Contract Number:
AC05-00OR22725; R0190-15-2012; 2015R1C1A1A0152105; AC52-06NA25396
Accepted Manuscript
Journal Name:
IEEE Transactions on Parallel and Distributed Systems
Additional Journal Information:
Journal Volume: 28; Journal Issue: 1; Journal ID: ISSN 1045-9219
Research Org:
Sogang Univ., Seoul (Korea, Republic of); Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
Sponsoring Org:
USDOE; Ministry of Science, ICT and Future Planning (MSIP) of Korea; National Research Foundation of Korea (NRF); USDOE Office of Science (SC). Advanced Scientific Computing Research (ASCR) (SC-21)
Contributing Orgs:
Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
Country of Publication:
United States
97 MATHEMATICS AND COMPUTING; Data transfer; Servers; Throughput; Data models; Instruction sets; Big data; Computational modeling; File and storage systems; Parallel file systems; networks; I/O scheduling
OSTI Identifier:
Alternate Identifier(s):
OSTI ID: 1407899