skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: FTLADS: Object-Logging Based Fault-Tolerant Big Data Transfer System Using Layout Aware Data Scheduling

Abstract

The layout-aware data scheduling (LADS) data movement framework optimizes congestion for end-to-end data transfers. During data transfer, LADS can avoid congested storage elements by exploiting the underlying storage layout at each endpoint. This improves the I/O bandwidth and hence the data transfer rate across high-speed networks. However, the absence of fault tolerance (FT) in LADS results in data retransmission overhead and may lead to possible data integrity issues upon faults. In this paper, we propose object-logging FT mechanisms to avoid transmitting the objects that are successfully written into the parallel file system (PFS) at the sink end. Depending on the number of log files created for the whole dataset, we have classified our FT mechanisms into three different categories: file logger, transaction logger, and universal logger. Also, to address the space overhead, we have proposed different methods of populating the log files with the information of the successfully transferred objects. We have evaluated the data transfer performance and recovery time overhead of the proposed object-logging-based FT mechanisms on the LADS data transfer framework. Our experimental results reflect that FT mechanisms exhibit negligible overhead (<; 1%) with respect to the data transfer time. Yet, the fault recovery time is 10% highermore » than the total data transfer time at any fault point.« less

Authors:
ORCiD logo [1];  [2];  [3];  [3]; ORCiD logo [4];  [2]
  1. Ajou Univ., Suwon (South Korea)
  2. TmaxCloud, Seognam (South Korea)
  3. Korea Inst. of Science and Technology Information, Daejeon (South Korea)
  4. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1530069
Grant/Contract Number:  
AC05-00OR22725
Resource Type:
Accepted Manuscript
Journal Name:
IEEE Access
Additional Journal Information:
Journal Volume: 7; Journal Issue: 1; Journal ID: ISSN 2169-3536
Publisher:
IEEE
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; Big data; geo-distributed data centers; fault tolerance; parallel system

Citation Formats

Kasu, Preethika, Kim, Taeuk, Um, Jung-Ho, Park, Kyongseok, Atchley, Scott, and Kim, Youngjae. FTLADS: Object-Logging Based Fault-Tolerant Big Data Transfer System Using Layout Aware Data Scheduling. United States: N. p., 2019. Web. doi:10.1109/ACCESS.2019.2905158.
Kasu, Preethika, Kim, Taeuk, Um, Jung-Ho, Park, Kyongseok, Atchley, Scott, & Kim, Youngjae. FTLADS: Object-Logging Based Fault-Tolerant Big Data Transfer System Using Layout Aware Data Scheduling. United States. doi:10.1109/ACCESS.2019.2905158.
Kasu, Preethika, Kim, Taeuk, Um, Jung-Ho, Park, Kyongseok, Atchley, Scott, and Kim, Youngjae. Thu . "FTLADS: Object-Logging Based Fault-Tolerant Big Data Transfer System Using Layout Aware Data Scheduling". United States. doi:10.1109/ACCESS.2019.2905158. https://www.osti.gov/servlets/purl/1530069.
@article{osti_1530069,
title = {FTLADS: Object-Logging Based Fault-Tolerant Big Data Transfer System Using Layout Aware Data Scheduling},
author = {Kasu, Preethika and Kim, Taeuk and Um, Jung-Ho and Park, Kyongseok and Atchley, Scott and Kim, Youngjae},
abstractNote = {The layout-aware data scheduling (LADS) data movement framework optimizes congestion for end-to-end data transfers. During data transfer, LADS can avoid congested storage elements by exploiting the underlying storage layout at each endpoint. This improves the I/O bandwidth and hence the data transfer rate across high-speed networks. However, the absence of fault tolerance (FT) in LADS results in data retransmission overhead and may lead to possible data integrity issues upon faults. In this paper, we propose object-logging FT mechanisms to avoid transmitting the objects that are successfully written into the parallel file system (PFS) at the sink end. Depending on the number of log files created for the whole dataset, we have classified our FT mechanisms into three different categories: file logger, transaction logger, and universal logger. Also, to address the space overhead, we have proposed different methods of populating the log files with the information of the successfully transferred objects. We have evaluated the data transfer performance and recovery time overhead of the proposed object-logging-based FT mechanisms on the LADS data transfer framework. Our experimental results reflect that FT mechanisms exhibit negligible overhead (<; 1%) with respect to the data transfer time. Yet, the fault recovery time is 10% higher than the total data transfer time at any fault point.},
doi = {10.1109/ACCESS.2019.2905158},
journal = {IEEE Access},
number = 1,
volume = 7,
place = {United States},
year = {2019},
month = {3}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Save / Share: