skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: GridFTP pipelining.

Abstract

GridFTP is an exceptionally fast transfer protocol for large volumes of data. Implementations of it are widely deployed and used on well-connected Grid environments such as those of the TeraGrid because of its ability to scale to network speeds. However, when the data is partitioned into many small files instead of few large files, it suffers from lower transfer rates. The latency between the serialized transfer requests of each file directly detracts from the amount of time data pathways are active, thus lowering achieved throughput. Further, when a data pathway is inactive, the TCP window closes, and TCP must go through the slow-start algorithm. The performance penalty can be severe. This situation is known as the 'lots of small files' problem. In this paper we introduce a solution to this problem. This solution, called pipelining, allows many transfer requests to be sent to the server before any one completes. Thus, pipelining hides the latency of each transfer request by sending the requests while a data transfer is in progress. We present an implementation and performance study of the pipelining solution.

Authors:
; ; ; ; ; ;
Publication Date:
Research Org.:
Argonne National Lab. (ANL), Argonne, IL (United States)
Sponsoring Org.:
USDOE Office of Science (SC); SciDAC-2 CEDPS
OSTI Identifier:
971459
Report Number(s):
ANL/MCS/CP-58820
TRN: US201004%%19
DOE Contract Number:
DE-AC02-06CH11357
Resource Type:
Conference
Resource Relation:
Conference: TeraGrid '07; Jun. 6, 2007 - Jun. 8, 2007; Madison, WI
Country of Publication:
United States
Language:
ENGLISH
Subject:
97 MATHEMATICAL METHODS AND COMPUTING; 99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE; G CODES; DATA TRANSMISSION; IMPLEMENTATION; PERFORMANCE

Citation Formats

Bresnahan, J., Link, M., Kettimuthu, R., Fraser, D., Foster, I., Mathematics and Computer Science, and Univ. of Chicago. GridFTP pipelining.. United States: N. p., 2007. Web.
Bresnahan, J., Link, M., Kettimuthu, R., Fraser, D., Foster, I., Mathematics and Computer Science, & Univ. of Chicago. GridFTP pipelining.. United States.
Bresnahan, J., Link, M., Kettimuthu, R., Fraser, D., Foster, I., Mathematics and Computer Science, and Univ. of Chicago. Mon . "GridFTP pipelining.". United States. doi:.
@article{osti_971459,
title = {GridFTP pipelining.},
author = {Bresnahan, J. and Link, M. and Kettimuthu, R. and Fraser, D. and Foster, I. and Mathematics and Computer Science and Univ. of Chicago},
abstractNote = {GridFTP is an exceptionally fast transfer protocol for large volumes of data. Implementations of it are widely deployed and used on well-connected Grid environments such as those of the TeraGrid because of its ability to scale to network speeds. However, when the data is partitioned into many small files instead of few large files, it suffers from lower transfer rates. The latency between the serialized transfer requests of each file directly detracts from the amount of time data pathways are active, thus lowering achieved throughput. Further, when a data pathway is inactive, the TCP window closes, and TCP must go through the slow-start algorithm. The performance penalty can be severe. This situation is known as the 'lots of small files' problem. In this paper we introduce a solution to this problem. This solution, called pipelining, allows many transfer requests to be sent to the server before any one completes. Thus, pipelining hides the latency of each transfer request by sending the requests while a data transfer is in progress. We present an implementation and performance study of the pipelining solution.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Mon Jan 01 00:00:00 EST 2007},
month = {Mon Jan 01 00:00:00 EST 2007}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share:
  • No abstract provided.
  • GridFTP is a high-performance, reliable data transfer protocol optimized for high-bandwidth wide-area networks. Based on the Internet FTP protocol, it defines extensions for high-performance operation and security. The Globus implementation of GridFTP provides a modular and extensible data transfer system architecture suitable for wide area and high-performance environments. GridFTP is the de facto standard in projects requiring secure, robust, high-speed bulk data transport. For example, the high energy physics community is basing its entire tiered data movement infrastructure for the Large Hadron Collider computing Grid on GridFTP; the Laser Interferometer Gravitational Wave Observatory routinely uses GridFTP to move 1 TBmore » a day during production runs; and GridFTP is the recommended data transfer mechanism to maximize data transfer rates on the TeraGrid. Commonly used GridFTP clients include globus-url-copy, uberftp, and the Globus Reliable File Transfer service. In this paper, we present a Globus XIO based client to GridFTP that provides a simple Open/Close/Read/Write (OCRW) interface to the users. Such a client greatly eases the addition of GridFTP support to third-party programs, such as SRB and MPICH-G2. Further, this client provides an easier and familiar interface for applications to efficiently access remote files. We compare the performance of this client with that of globus-url-copy on multiple endpoints in the TeraGrid infrastructure. We perform both memory-to-memory and disk-to-disk transfers and show that the performance of this OCRW client is comparable to that of globus-url-copy. We also show that our GridFTP client significantly outperforms the GPFS WAN on the TeraGrid.« less
  • The author presents a new general approach to the concept of the functional computational models based on a specific type of functional decomposition. It employs a great number of internal functions of utmost of two variables computed concurrently in a pipeline mode and a relatively small number of external aggregate functions of many variables computed in the processors controlling these pipeline streams. As a fundamental technique for this organization the author suggests associative pipelining-an algorithmically dual mechanism to the associative processor. Such kind of a computational model has natural potentials for high degree of concurrency and allows efficient VLSI implementationmore » due to the uniformity properties of the associative pipelining. 7 references.« less
  • A formal link between the data flow model of MIMD computation and the design and analysis of systolic systems is discussed. To establish the relationship between these two models of computation, a small set of functional operators is described; these make it possible to express many vector and array algorithms as networks of interacting data-driven processes. By the use of these tools, it is then shown that the data flow graphs of many functions can be reformulated as systolic systems. The main result of the paper is a theorem which gives conditions which will guarantee that the systolic version ofmore » the computation graph will perform asymptotically as fast as a fully concurrent execution of the original data flow graph. 5 references.« less
  • A particular approach to specifying procedure interconnection and allocation is presented. The major result is that, within stated assumptions, networks constructed using a small set of structured process connectives can achieve at least as good throughput (pipelining performance) as arbitrarily interconnected networks. 20 references.