skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Improving I/O Throughput with PRIMACY: Preconditioning ID-Mapper for Compressing Incompressibility

Abstract

The ability to efficiently handle massive amounts of data is necessary for the continuing development towards exascale scientific data-mining applications and database systems. Unfortunately, recent years have shown a growing gap between the size and complexity of data produced from scientific applications and the limited I/O bandwidth available on modern high-performance computing systems. Utilizing data compression in order to lower the degree of I/O activity offers a promising means to addressing this problem. However, the standard compression algorithms previously explored for such use offer limited gains on both the end-to-end throughput and storage fronts. In this paper, we introduce an in-situ compression scheme aimed at improving end-to-end I/O throughput as well as reduction of dataset size. Our technique, PRIMACY (Preconditioning Id-MApper for Compressing incompressibility), acts as a preconditioner for standard compression libraries by modifying representation of original floating-point scientific data to increase byte-level repeatability, allowing standard loss less compressors to take advantage of their entropy-based byte-level encoding schemes. We additionally present a theoretical model for compression efficiency in high-performance computing environments and evaluate the efficiency of our approach via comparative analysis. Based on our evaluations on 20 real-world scientific datasets, PRIMACY achieved up to 38% and 22% improvements upon standardmore » end-to-end write and read throughputs respectively in addition to a 25% increase in compression ratios paired with 3-to-4-fold improvement in both compression and decompression throughput over general purpose compressors.« less

Authors:
; ; ; ; ;
Publication Date:
Research Org.:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
1567308
Resource Type:
Conference
Journal Name:
2012 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER)
Additional Journal Information:
Conference: 2012 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING, September 24-28, 2012, Beijing, China
Country of Publication:
United States
Language:
English
Subject:
Computer Science

Citation Formats

Shah, Neil, Schendel, Eric R., Lakshminarasimhan, Sriram, Pendse, Saurabh V., Rogers, Terry, and Samatova, Nagiza F. Improving I/O Throughput with PRIMACY: Preconditioning ID-Mapper for Compressing Incompressibility. United States: N. p., 2012. Web. doi:10.1109/CLUSTER.2012.16.
Shah, Neil, Schendel, Eric R., Lakshminarasimhan, Sriram, Pendse, Saurabh V., Rogers, Terry, & Samatova, Nagiza F. Improving I/O Throughput with PRIMACY: Preconditioning ID-Mapper for Compressing Incompressibility. United States. doi:10.1109/CLUSTER.2012.16.
Shah, Neil, Schendel, Eric R., Lakshminarasimhan, Sriram, Pendse, Saurabh V., Rogers, Terry, and Samatova, Nagiza F. Thu . "Improving I/O Throughput with PRIMACY: Preconditioning ID-Mapper for Compressing Incompressibility". United States. doi:10.1109/CLUSTER.2012.16.
@article{osti_1567308,
title = {Improving I/O Throughput with PRIMACY: Preconditioning ID-Mapper for Compressing Incompressibility},
author = {Shah, Neil and Schendel, Eric R. and Lakshminarasimhan, Sriram and Pendse, Saurabh V. and Rogers, Terry and Samatova, Nagiza F.},
abstractNote = {The ability to efficiently handle massive amounts of data is necessary for the continuing development towards exascale scientific data-mining applications and database systems. Unfortunately, recent years have shown a growing gap between the size and complexity of data produced from scientific applications and the limited I/O bandwidth available on modern high-performance computing systems. Utilizing data compression in order to lower the degree of I/O activity offers a promising means to addressing this problem. However, the standard compression algorithms previously explored for such use offer limited gains on both the end-to-end throughput and storage fronts. In this paper, we introduce an in-situ compression scheme aimed at improving end-to-end I/O throughput as well as reduction of dataset size. Our technique, PRIMACY (Preconditioning Id-MApper for Compressing incompressibility), acts as a preconditioner for standard compression libraries by modifying representation of original floating-point scientific data to increase byte-level repeatability, allowing standard loss less compressors to take advantage of their entropy-based byte-level encoding schemes. We additionally present a theoretical model for compression efficiency in high-performance computing environments and evaluate the efficiency of our approach via comparative analysis. Based on our evaluations on 20 real-world scientific datasets, PRIMACY achieved up to 38% and 22% improvements upon standard end-to-end write and read throughputs respectively in addition to a 25% increase in compression ratios paired with 3-to-4-fold improvement in both compression and decompression throughput over general purpose compressors.},
doi = {10.1109/CLUSTER.2012.16},
journal = {2012 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER)},
number = ,
volume = ,
place = {United States},
year = {2012},
month = {10}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: