skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: High-performance End-to-End Integrity Verification on Big Data Transfer

Abstract

The scale of scientific data generated by experimental facilities and simulations in high-performance computing facilities has been proliferating with the emergence of IoT-based big data. In many cases, this data must be transmitted rapidly and reliably to remote facilities for storage, analysis, or sharing, for the Internet of Things (IoT) applications. Simultaneously, IoT data can be verified using a checksum after the data has been written to the disk at the destination to ensure its integrity. However, this end-to-end integrity verification inevitably creates overheads (extra disk I/O and more computation). Thus, the overall data transfer time increases. In this article, we evaluate strategies to maximize the overlap between data transfer and checksum computation for astronomical observation data. Specifically, we examine file-level and block-level (with various block sizes) pipelining to overlap data transfer and checksum computation. We analyze these pipelining approaches in the context of GridFTP, a widely used protocol for scientific data transfers. Theoretical analysis and experiments are conducted to evaluate our methods. The results show that block-level pipelining is effective in maximizing the overlap mentioned above, and can improve the overall data transfer time with end-to-end integrity verification by up to 70% compared to the sequential execution of transfermore » and checksum, and by up to 60% compared to file-level pipelining.« less

Authors:
; ; ;
Publication Date:
Research Org.:
Argonne National Lab. (ANL), Argonne, IL (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Basic Energy Sciences (BES) (SC-22); National Science Foundation (NSF); Hongik University
OSTI Identifier:
1573256
DOE Contract Number:  
AC02-06CH11357
Resource Type:
Journal Article
Journal Name:
IEICE Transactions on Information and Systems
Additional Journal Information:
Journal Volume: E102D; Journal Issue: 8
Country of Publication:
United States
Language:
English
Subject:
IoT-based big data; data integrity; high-performance data transfer; pipelining

Citation Formats

Jung, Eun-Sung, Liu, Si, Kettimuthu, Rajkumar, and Jung, Sung Wook. High-performance End-to-End Integrity Verification on Big Data Transfer. United States: N. p., 2019. Web. doi:10.1587/transinf.2018EDP7297.
Jung, Eun-Sung, Liu, Si, Kettimuthu, Rajkumar, & Jung, Sung Wook. High-performance End-to-End Integrity Verification on Big Data Transfer. United States. doi:10.1587/transinf.2018EDP7297.
Jung, Eun-Sung, Liu, Si, Kettimuthu, Rajkumar, and Jung, Sung Wook. Thu . "High-performance End-to-End Integrity Verification on Big Data Transfer". United States. doi:10.1587/transinf.2018EDP7297.
@article{osti_1573256,
title = {High-performance End-to-End Integrity Verification on Big Data Transfer},
author = {Jung, Eun-Sung and Liu, Si and Kettimuthu, Rajkumar and Jung, Sung Wook},
abstractNote = {The scale of scientific data generated by experimental facilities and simulations in high-performance computing facilities has been proliferating with the emergence of IoT-based big data. In many cases, this data must be transmitted rapidly and reliably to remote facilities for storage, analysis, or sharing, for the Internet of Things (IoT) applications. Simultaneously, IoT data can be verified using a checksum after the data has been written to the disk at the destination to ensure its integrity. However, this end-to-end integrity verification inevitably creates overheads (extra disk I/O and more computation). Thus, the overall data transfer time increases. In this article, we evaluate strategies to maximize the overlap between data transfer and checksum computation for astronomical observation data. Specifically, we examine file-level and block-level (with various block sizes) pipelining to overlap data transfer and checksum computation. We analyze these pipelining approaches in the context of GridFTP, a widely used protocol for scientific data transfers. Theoretical analysis and experiments are conducted to evaluate our methods. The results show that block-level pipelining is effective in maximizing the overlap mentioned above, and can improve the overall data transfer time with end-to-end integrity verification by up to 70% compared to the sequential execution of transfer and checksum, and by up to 60% compared to file-level pipelining.},
doi = {10.1587/transinf.2018EDP7297},
journal = {IEICE Transactions on Information and Systems},
number = 8,
volume = E102D,
place = {United States},
year = {2019},
month = {8}
}