skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

This content will become publicly available on May 23, 2020

Title: Implementing efficient data compression and encryption in a persistent key-value store for HPC

Abstract

Recently, persistent data structures, like key-value stores (KVSs), which are stored in a high-performance computing (HPC) system’s nonvolatile memory, provide an attractive solution for a number of emerging challenges like limited I/O performance. Data compression and encryption are two well-known techniques for improving several properties of such data-oriented systems. This article investigates how to efficiently integrate data compression and encryption into persistent KVSs for HPC with the ultimate goal of hiding their costs and complexity in terms of performance and ease of use. Our compression technique exploits deep memory hierarchy in an HPC system to achieve both storage reduction and performance improvement. Our encryption technique provides a practical level of security and enables sharing of sensitive data securely in complex scientific workflows with nearly imperceptible cost. We implement the proposed techniques on top of a distributed embedded KVS to evaluate the benefits and costs of incorporating these capabilities along different points in the dataflow path, illustrating differences in effective bandwidth, latency, and additional computational expense on Swiss National Supercomputing Centre’s Grand Tavé and National Energy Research Scientific Computing Center’s Cori.

Authors:
ORCiD logo [1]; ORCiD logo [1]
  1. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)
OSTI Identifier:
1564200
Alternate Identifier(s):
OSTI ID: 1515585
Grant/Contract Number:  
AC05-00OR22725
Resource Type:
Accepted Manuscript
Journal Name:
International Journal of High Performance Computing Applications
Additional Journal Information:
Journal Volume: 33; Journal Issue: 6; Journal ID: ISSN 1094-3420
Publisher:
SAGE
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; High-performance computing; nonvolatile memory; persistent memory; distributed systems; programming systems

Citation Formats

Kim, Jungwon, and Vetter, Jeffrey S. Implementing efficient data compression and encryption in a persistent key-value store for HPC. United States: N. p., 2019. Web. doi:10.1177/1094342019847264.
Kim, Jungwon, & Vetter, Jeffrey S. Implementing efficient data compression and encryption in a persistent key-value store for HPC. United States. doi:10.1177/1094342019847264.
Kim, Jungwon, and Vetter, Jeffrey S. Thu . "Implementing efficient data compression and encryption in a persistent key-value store for HPC". United States. doi:10.1177/1094342019847264.
@article{osti_1564200,
title = {Implementing efficient data compression and encryption in a persistent key-value store for HPC},
author = {Kim, Jungwon and Vetter, Jeffrey S.},
abstractNote = {Recently, persistent data structures, like key-value stores (KVSs), which are stored in a high-performance computing (HPC) system’s nonvolatile memory, provide an attractive solution for a number of emerging challenges like limited I/O performance. Data compression and encryption are two well-known techniques for improving several properties of such data-oriented systems. This article investigates how to efficiently integrate data compression and encryption into persistent KVSs for HPC with the ultimate goal of hiding their costs and complexity in terms of performance and ease of use. Our compression technique exploits deep memory hierarchy in an HPC system to achieve both storage reduction and performance improvement. Our encryption technique provides a practical level of security and enables sharing of sensitive data securely in complex scientific workflows with nearly imperceptible cost. We implement the proposed techniques on top of a distributed embedded KVS to evaluate the benefits and costs of incorporating these capabilities along different points in the dataflow path, illustrating differences in effective bandwidth, latency, and additional computational expense on Swiss National Supercomputing Centre’s Grand Tavé and National Energy Research Scientific Computing Center’s Cori.},
doi = {10.1177/1094342019847264},
journal = {International Journal of High Performance Computing Applications},
number = 6,
volume = 33,
place = {United States},
year = {2019},
month = {5}
}

Journal Article:
Free Publicly Available Full Text
This content will become publicly available on May 23, 2020
Publisher's Version of Record

Save / Share:

Works referenced in this record:

How Persistent Memory Will Change Software Systems
journal, August 2013


A Survey Of Architectural Approaches for Data Compression in Cache and Main Memory Systems
journal, May 2016

  • Mittal, Sparsh; Vetter, Jeffrey S.
  • IEEE Transactions on Parallel and Distributed Systems, Vol. 27, Issue 5
  • DOI: 10.1109/TPDS.2015.2435788

Bigtable: A Distributed Storage System for Structured Data
journal, June 2008

  • Chang, Fay; Dean, Jeffrey; Ghemawat, Sanjay
  • ACM Transactions on Computer Systems, Vol. 26, Issue 2
  • DOI: 10.1145/1365815.1365816

In-Memory Big Data Management and Processing: A Survey
journal, July 2015

  • Zhang, Hao; Chen, Gang; Ooi, Beng Chin
  • IEEE Transactions on Knowledge and Data Engineering, Vol. 27, Issue 7
  • DOI: 10.1109/TKDE.2015.2427795

Hello ADIOS: the challenges and lessons of developing leadership class I/O frameworks: HELLO ADIOS
journal, August 2013

  • Liu, Qing; Logan, Jeremy; Tian, Yuan
  • Concurrency and Computation: Practice and Experience, Vol. 26, Issue 7
  • DOI: 10.1002/cpe.3125

Database encryption: an overview of contemporary challenges and design considerations
journal, December 2010

  • Shmueli, Erez; Vaisenberg, Ronen; Elovici, Yuval
  • ACM SIGMOD Record, Vol. 38, Issue 3
  • DOI: 10.1145/1815933.1815940

Meraculous: De Novo Genome Assembly with Short Paired-End Reads
journal, August 2011


Database compression
journal, September 1993


The log-structured merge-tree (LSM-tree)
journal, June 1996

  • O’Neil, Patrick; Cheng, Edward; Gawlick, Dieter
  • Acta Informatica, Vol. 33, Issue 4
  • DOI: 10.1007/s002360050048

Data Compression in Scientific and Statistical Databases
journal, October 1985


Cassandra: a decentralized structured storage system
journal, April 2010

  • Lakshman, Avinash; Malik, Prashant
  • ACM SIGOPS Operating Systems Review, Vol. 44, Issue 2
  • DOI: 10.1145/1773912.1773922

Secure sharing of Personal Health Records in cloud computing: Ciphertext-Policy Attribute-Based Signcryption
journal, November 2015



journal, October 2010

  • Chebotko, A.
  • IEEE Transactions on Services Computing, Vol. 3, Issue 4
  • DOI: 10.1109/TSC.2010.38

Memory encryption: A survey of existing techniques
journal, March 2014

  • Henson, Michael; Taylor, Stephen
  • ACM Computing Surveys, Vol. 46, Issue 4
  • DOI: 10.1145/2566673

In Situ Visualization at Extreme Scale: Challenges and Opportunities
journal, November 2009


Main Memory in HPC: Do We Need More or Could We Live with Less?
journal, March 2017

  • Zivanovic, Darko; Pavlovic, Milan; Radulovic, Milan
  • ACM Transactions on Architecture and Code Optimization, Vol. 14, Issue 1
  • DOI: 10.1145/3023362

Bluecache: a scalable distributed flash-based key-value store
journal, November 2016

  • Xu, Shuotao; Lee, Sungjin; Jun, Sang-Woo
  • Proceedings of the VLDB Endowment, Vol. 10, Issue 4
  • DOI: 10.14778/3025111.3025113

Security in high-performance computing environments
journal, August 2017

  • Peisert, Sean
  • Communications of the ACM, Vol. 60, Issue 9
  • DOI: 10.1145/3096742

Survey and benchmark of block ciphers for wireless sensor networks
journal, February 2006

  • Law, Yee Wei; Doumen, Jeroen; Hartel, Pieter
  • ACM Transactions on Sensor Networks, Vol. 2, Issue 1
  • DOI: 10.1145/1138127.1138130