GridFTP pipelining.
GridFTP is an exceptionally fast transfer protocol for large volumes of data. Implementations of it are widely deployed and used on well-connected Grid environments such as those of the TeraGrid because of its ability to scale to network speeds. However, when the data is partitioned into many small files instead of few large files, it suffers from lower transfer rates. The latency between the serialized transfer requests of each file directly detracts from the amount of time data pathways are active, thus lowering achieved throughput. Further, when a data pathway is inactive, the TCP window closes, and TCP must go through the slow-start algorithm. The performance penalty can be severe. This situation is known as the 'lots of small files' problem. In this paper we introduce a solution to this problem. This solution, called pipelining, allows many transfer requests to be sent to the server before any one completes. Thus, pipelining hides the latency of each transfer request by sending the requests while a data transfer is in progress. We present an implementation and performance study of the pipelining solution.
- Research Organization:
- Argonne National Lab. (ANL), Argonne, IL (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC); SciDAC-2 CEDPS
- DOE Contract Number:
- DE-AC02-06CH11357
- OSTI ID:
- 971459
- Report Number(s):
- ANL/MCS/CP-58820; TRN: US201004%%19
- Resource Relation:
- Conference: TeraGrid '07; Jun. 6, 2007 - Jun. 8, 2007; Madison, WI
- Country of Publication:
- United States
- Language:
- ENGLISH
Similar Records
RXIO: Design and implementation of high performance RDMA-capable GridFTP
Moving small files in a networked environment