skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Analyzing Data Movements and Identifying Techniques for Next-generation High-bandwidth Networks

Abstract

High-bandwidth networks are poised to provide new opportunities in tackling large data challenges in today's scientific applications. However, increasing the bandwidth is not sufficient by itself; we need careful evaluation of future high-bandwidth networks from the applications’ perspective. We have investigated data transfer requirements of climate applications as a typical scientific example and evaluated how the scientific community can benefit from next generation high-bandwidth networks. We have experimented with current state-of-the-art data movement tools, and realized that there is no single solution for presetting transfer parameters for optimal use of the available bandwidth. Thus, we developed an adaptive transfer methodology for tuning and optimization in wide-area data transfers. This worked well with large files. However, typical scientific datasets may include many small files. Current filecentric data transfer protocols do not perform well with managing the transfer of small files, even when using parallel streams or concurrent transfers over high bandwidth networks. In order to overcome this problem, we develop a new block-based data movement method (in contrast to the current file-based methods) to improve data movement performance and efficiency in moving large scientific datasets that contain many small files. We implemented the new block-based data movement tool, which takes themore » approach of aggregating files into blocks and providing dynamic data channel management. In our work, we also realized that one of the major obstacles in use of high-bandwidth networks is the limitation in host system resources. 100Gbps is beyond the capacity of today’s commodity machine, since we need substantial amount of processing power and involvement of multiple cores to fill a 40Gbps or 100Gbps network. As a result, host system performance plays an important role in the use of highbandwidth networks. We have conducted a large number of experiments with our new block-based method and with current available file-based data movement tools. In this white paper, we describe future research problems and challenges for efficient use of next-generation science networks, based on the lessons learnt and the experiences gained with 100Gbps network applications.« less

Authors:
 [1]
  1. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Publication Date:
Research Org.:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
1171627
Report Number(s):
LBNL-6177E
DOE Contract Number:  
AC02-05CH11231
Resource Type:
Technical Report
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; 100Gbps network applications

Citation Formats

Balman, Mehmet. Analyzing Data Movements and Identifying Techniques for Next-generation High-bandwidth Networks. United States: N. p., 2012. Web. doi:10.2172/1171627.
Balman, Mehmet. Analyzing Data Movements and Identifying Techniques for Next-generation High-bandwidth Networks. United States. doi:10.2172/1171627.
Balman, Mehmet. Sun . "Analyzing Data Movements and Identifying Techniques for Next-generation High-bandwidth Networks". United States. doi:10.2172/1171627. https://www.osti.gov/servlets/purl/1171627.
@article{osti_1171627,
title = {Analyzing Data Movements and Identifying Techniques for Next-generation High-bandwidth Networks},
author = {Balman, Mehmet},
abstractNote = {High-bandwidth networks are poised to provide new opportunities in tackling large data challenges in today's scientific applications. However, increasing the bandwidth is not sufficient by itself; we need careful evaluation of future high-bandwidth networks from the applications’ perspective. We have investigated data transfer requirements of climate applications as a typical scientific example and evaluated how the scientific community can benefit from next generation high-bandwidth networks. We have experimented with current state-of-the-art data movement tools, and realized that there is no single solution for presetting transfer parameters for optimal use of the available bandwidth. Thus, we developed an adaptive transfer methodology for tuning and optimization in wide-area data transfers. This worked well with large files. However, typical scientific datasets may include many small files. Current filecentric data transfer protocols do not perform well with managing the transfer of small files, even when using parallel streams or concurrent transfers over high bandwidth networks. In order to overcome this problem, we develop a new block-based data movement method (in contrast to the current file-based methods) to improve data movement performance and efficiency in moving large scientific datasets that contain many small files. We implemented the new block-based data movement tool, which takes the approach of aggregating files into blocks and providing dynamic data channel management. In our work, we also realized that one of the major obstacles in use of high-bandwidth networks is the limitation in host system resources. 100Gbps is beyond the capacity of today’s commodity machine, since we need substantial amount of processing power and involvement of multiple cores to fill a 40Gbps or 100Gbps network. As a result, host system performance plays an important role in the use of highbandwidth networks. We have conducted a large number of experiments with our new block-based method and with current available file-based data movement tools. In this white paper, we describe future research problems and challenges for efficient use of next-generation science networks, based on the lessons learnt and the experiences gained with 100Gbps network applications.},
doi = {10.2172/1171627},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2012},
month = {1}
}