Optimizing End-to-End Big Data Transfers over Terabits Network Infrastructure
- Sogang Univ., Seoul (Korea, Republic of). Dept. of Computer Science and Engineering
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
- Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
While future terabit networks hold the promise of significantly improving big-data motion among geographically distributed data centers, significant challenges must be overcome even on today's 100 gigabit networks to realize end-to-end performance. Multiple bottlenecks exist along the end-to-end path from source to sink, for instance, the data storage infrastructure at both the source and sink and its interplay with the wide-area network are increasingly the bottleneck to achieving high performance. In this study, we identify the issues that lead to congestion on the path of an end-to-end data transfer in the terabit network environment, and we present a new bulk data movement framework for terabit networks, called LADS. LADS exploits the underlying storage layout at each endpoint to maximize throughput without negatively impacting the performance of shared storage resources for other users. LADS also uses the Common Communication Interface (CCI) in lieu of the sockets interface to benefit from hardware-level zero-copy, and operating system bypass capabilities when available. It can further improve data transfer performance under congestion on the end systems using buffering at the source using flash storage. With our evaluations, we show that LADS can avoid congested storage elements within the shared storage resource, improving input/output bandwidth, and data transfer rates across the high speed networks. We also investigate the performance degradation problems of LADS due to I/O contention on the parallel file system (PFS), when multiple LADS tools share the PFS. We design and evaluate a meta-scheduler to coordinate multiple I/O streams while sharing the PFS, to minimize the I/O contention on the PFS. Finally, with our evaluations, we observe that LADS with meta-scheduling can further improve the performance by up to 14 percent relative to LADS without meta-scheduling.
- Research Organization:
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF); Sogang Univ., Seoul (Korea, Republic of); Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
- Sponsoring Organization:
- DOE Office of Science; USDOE; Ministry of Science, ICT and Future Planning (MSIP) (Korea, Republic of); National Research Foundation of Korea (NRF) (Korea, Republic of)
- Contributing Organization:
- Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
- Grant/Contract Number:
- AC05-00OR22725
- OSTI ID:
- 1361284
- Alternate ID(s):
- OSTI ID: 1407899
- Journal Information:
- IEEE Transactions on Parallel and Distributed Systems, Journal Name: IEEE Transactions on Parallel and Distributed Systems Journal Issue: 1 Vol. 28; ISSN 1045-9219
- Publisher:
- IEEECopyright Statement
- Country of Publication:
- United States
- Language:
- English
Optimizing communication performance in scale-out storage system
|
journal | July 2018 |
Async-LCAM: a lock contention aware messenger for Ceph distributed storage system
|
journal | July 2018 |
New Bargaining Game Model for Collaborative Vehicular Network Services
|
journal | March 2019 |
Similar Records
NUMA-Aware Thread Scheduling for Big Data Transfers over Terabits Network Infrastructure
Layout-Aware I/O Scheduling for Terabits Data Movement