skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: On the Use of GPUs in Realizing Cost-Effective Distributed RAID

Abstract

The exponential growth in user and application data entails new means for providing fault tolerance and protection against data loss. High Performance Computing (HPC) storage systems, which are at the forefront of handling the data deluge, typically employ hardware RAID at the backend. However, such solutions are costly, do not ensure end-to-end data integrity, and can become a bottleneck during data reconstruction. In this paper, we design an innovative solution to achieve a flexible, fault-tolerant, and high-performance RAID-6 solution for a parallel file system (PFS). Our system utilizes low-cost, strategically placed GPUs - both on the client and server sides - to accelerate parity computation. In contrast to hardware-based approaches, we provide full control over the size, length and location of a RAID array on a per file basis, end-to-end data integrity checking, and parallelization of RAID array reconstruction. We have deployed our system in conjunction with the widely-used Lustre PFS, and show that our approach is feasible and imposes acceptable overhead.

Authors:
 [1];  [2];  [1];  [3];  [4]
  1. Virginia Polytechnic Institute and State University (Virginia Tech)
  2. Qatar Foundation
  3. ORNL
  4. Queen's University, Belfast
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE Laboratory Directed Research and Development (LDRD) Program
OSTI Identifier:
1048750
DOE Contract Number:  
DE-AC05-00OR22725
Resource Type:
Conference
Resource Relation:
Conference: Proceedings of the IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS 12), Washington, D.C., DC, USA, 20120807, 20120807
Country of Publication:
United States
Language:
English
Subject:
72 PHYSICS OF ELEMENTARY PARTICLES AND FIELDS; COMPUTERS; DESIGN; PARITY; PERFORMANCE; SIMULATION; STORAGE; TOLERANCE; DATA BASE MANAGEMENT; DATA COMPILATION

Citation Formats

Khasymski, Aleksandr, Rafique, Mustafa, Butt, Ali R, Vazhkudai, Sudharshan S, and Nikolopoulos, Dimitrios. On the Use of GPUs in Realizing Cost-Effective Distributed RAID. United States: N. p., 2012. Web.
Khasymski, Aleksandr, Rafique, Mustafa, Butt, Ali R, Vazhkudai, Sudharshan S, & Nikolopoulos, Dimitrios. On the Use of GPUs in Realizing Cost-Effective Distributed RAID. United States.
Khasymski, Aleksandr, Rafique, Mustafa, Butt, Ali R, Vazhkudai, Sudharshan S, and Nikolopoulos, Dimitrios. Sun . "On the Use of GPUs in Realizing Cost-Effective Distributed RAID". United States.
@article{osti_1048750,
title = {On the Use of GPUs in Realizing Cost-Effective Distributed RAID},
author = {Khasymski, Aleksandr and Rafique, Mustafa and Butt, Ali R and Vazhkudai, Sudharshan S and Nikolopoulos, Dimitrios},
abstractNote = {The exponential growth in user and application data entails new means for providing fault tolerance and protection against data loss. High Performance Computing (HPC) storage systems, which are at the forefront of handling the data deluge, typically employ hardware RAID at the backend. However, such solutions are costly, do not ensure end-to-end data integrity, and can become a bottleneck during data reconstruction. In this paper, we design an innovative solution to achieve a flexible, fault-tolerant, and high-performance RAID-6 solution for a parallel file system (PFS). Our system utilizes low-cost, strategically placed GPUs - both on the client and server sides - to accelerate parity computation. In contrast to hardware-based approaches, we provide full control over the size, length and location of a RAID array on a per file basis, end-to-end data integrity checking, and parallelization of RAID array reconstruction. We have deployed our system in conjunction with the widely-used Lustre PFS, and show that our approach is feasible and imposes acceptable overhead.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2012},
month = {1}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: