Performance Optimization for Relative-Error-Bounded Lossy Compression on Scientific Data

Zou, Xiangyu; Lu, Tao; Xia, Wen; Wang, Xuan; Zhang, Weizhe; Zhang, Haijun; Di, Sheng; Tao, Dingwen; Cappello, Franck

doi:10.1109/TPDS.2020.2972548

Title: Performance Optimization for Relative-Error-Bounded Lossy Compression on Scientific Data

Full Record
Other Related Research

Abstract

Scientific simulations in high-performance computing (HPC) environments generate vast volume of data, which may cause a severe I/O bottleneck at runtime and a huge burden on storage space for postanalysis. Unlike traditional data reduction schemes such as deduplication or lossless compression, not only can error-controlled lossy compression significantly reduce the data size but it also holds the promise to satisfy user demand on error control. Pointwise relative error bounds (i.e., compression errors depends on the data values) are widely used by many scientific applications with lossy compression since error control can adapt to the error bound in the dataset automatically. Pointwise relative-error-bounded compression is complicated and time consuming. In this article, we develop efficient precomputation-based mechanisms based on the SZ lossy compression framework. Our mechanisms can avoid costly logarithmic transformation and identify quantization factor values via a fast table lookup, greatly accelerating the relative-error-bounded compression with excellent compression ratios. In addition, we reduce traversing operations for Huffman decoding, significantly accelerating the decompression process in SZ. Experiments with eight well-known real-world scientific simulation datasets show that our solution can improve the compression and decompression rates (i.e., the speed) by about 40 and 80 p, respectively, in most of cases, making ourmore »« less

Authors:

^[1]; Lu, Tao ^[2];

^[1];

^[3]; Tao, Dingwen ^[4]; Cappello, Franck ^[3]

Harbin Inst. of Technology (China)
Marvell Technology Group, Santa Clara, CA (United States)
Argonne National Lab. (ANL), Argonne, IL (United States)
Univ. of Alabama, Tuscaloosa, AL (United States)

Publication Date:: Mon Feb 10 00:00:00 EST 2020

Research Org.:: Argonne National Lab. (ANL), Argonne, IL (United States)

Sponsoring Org.:: National Science Foundation (NSF); USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)

Contributing Org.:: National Key Research and Development Program of China

OSTI Identifier:: 1603491

Grant/Contract Number:: AC02-06CH11357

Resource Type:: Accepted Manuscript

Journal Name:: IEEE Transactions on Parallel and Distributed Systems

Additional Journal Information:: Journal Volume: 31; Journal Issue: 7; Journal ID: ISSN 1045-9219

Publisher:: IEEE

Country of Publication:: United States

Language:: English

Subject:: 97 MATHEMATICS AND COMPUTING; 96 KNOWLEDGE MANAGEMENT AND PRESERVATION; Lossy compression; compression rate; high-performance computing; scientific data

Citation Formats


                    Zou, Xiangyu, Lu, Tao, Xia, Wen, Wang, Xuan, Zhang, Weizhe, Zhang, Haijun, Di, Sheng, Tao, Dingwen, and Cappello, Franck. Performance Optimization for Relative-Error-Bounded Lossy Compression on Scientific Data.  United States: N. p., 2020. 
Web.  doi:10.1109/TPDS.2020.2972548.

Copy to clipboard


                    Zou, Xiangyu, Lu, Tao, Xia, Wen, Wang, Xuan, Zhang, Weizhe, Zhang, Haijun, Di, Sheng, Tao, Dingwen, & Cappello, Franck. Performance Optimization for Relative-Error-Bounded Lossy Compression on Scientific Data.  United States.  https://doi.org/10.1109/TPDS.2020.2972548

Copy to clipboard


                    Zou, Xiangyu, Lu, Tao, Xia, Wen, Wang, Xuan, Zhang, Weizhe, Zhang, Haijun, Di, Sheng, Tao, Dingwen, and Cappello, Franck. Mon .  
"Performance Optimization for Relative-Error-Bounded Lossy Compression on Scientific Data".  United States.  https://doi.org/10.1109/TPDS.2020.2972548.  https://www.osti.gov/servlets/purl/1603491.

Copy to clipboard


                    
@article{osti_1603491,

  title        = {Performance Optimization for Relative-Error-Bounded Lossy Compression on Scientific Data},

  author       = {Zou, Xiangyu and Lu, Tao and Xia, Wen and Wang, Xuan and Zhang, Weizhe and Zhang, Haijun and Di, Sheng and Tao, Dingwen and Cappello, Franck},

  abstractNote = {Scientific simulations in high-performance computing (HPC) environments generate vast volume of data, which may cause a severe I/O bottleneck at runtime and a huge burden on storage space for postanalysis. Unlike traditional data reduction schemes such as deduplication or lossless compression, not only can error-controlled lossy compression significantly reduce the data size but it also holds the promise to satisfy user demand on error control. Pointwise relative error bounds (i.e., compression errors depends on the data values) are widely used by many scientific applications with lossy compression since error control can adapt to the error bound in the dataset automatically. Pointwise relative-error-bounded compression is complicated and time consuming. In this article, we develop efficient precomputation-based mechanisms based on the SZ lossy compression framework. Our mechanisms can avoid costly logarithmic transformation and identify quantization factor values via a fast table lookup, greatly accelerating the relative-error-bounded compression with excellent compression ratios. In addition, we reduce traversing operations for Huffman decoding, significantly accelerating the decompression process in SZ. Experiments with eight well-known real-world scientific simulation datasets show that our solution can improve the compression and decompression rates (i.e., the speed) by about 40 and 80 p, respectively, in most of cases, making our designed lossy compression strategy the best-in-class solution in most cases.},

  doi          = {10.1109/TPDS.2020.2972548},

  journal      = {IEEE Transactions on Parallel and Distributed Systems},

  number       = 7,

  volume       = 31,

  place        = {United States},

  year         = {Mon Feb 10 00:00:00 EST 2020},

  month        = {Mon Feb 10 00:00:00 EST 2020}

}

Copy to clipboard

Journal Article:

Free Publicly Available Full Text

Accepted Manuscript (DOE)

Publisher's Version of Record

https://doi.org/10.1109/TPDS.2020.2972548

Other availability

Search WorldCat to find libraries that may hold this journal

Citation Metrics:

Cited by: 5 works

Citation information provided by
Web of Science

Save / Share:

Export Metadata

Save to My Library

Similar Records in DOE PAGES and OSTI.GOV collections:

Accelerating Relative-error Bounded Lossy Compression for HPC datasets with Precomputation-Based Mechanisms

Conference Zou, Xiangyu ; Lu, Tao ; Xia, Wen ; ...

Scientific simulations in high-performance computing (HPC) environments are producing vast volume of data, which may cause a severe I/O bottleneck at runtime and a huge burden on storage space for post-analysis. Unlike the traditional data reduction schemes (such as deduplication or lossless compression), not only can error-controlled lossy compression significantly reduce the data size but it can also hold the promise to satisfy user demand on error control. Point-wise relative error bounds (i.e., compression errors depends on the data values) are widely used by many scientific applications in the lossy compression, since error control can adapt to the precision inmore »« less
waveSZ: A Hardware-Algorithm Co-Design of Efficient Lossy Compression for Scientific Data

Conference Tian, Jiannan ; Di, Sheng ; Zhang, Chengming ; ...

Error-bounded lossy compression is critical to the success of extreme-scale scientific research because of ever-increasing volumes of data produced by today's high-performance computing (HPC) applications. Not only can error-controlled lossy compressors significantly reduce the I/O and storage burden but they can retain high data fidelity for post analysis. Existing state-of-the-art lossy compressors, however, generally suffer from relatively low compression and decompression throughput (up to hundreds of megabytes per second on a single CPU core), which considerably restrict the adoption of lossy compression by many HPC applications especially those with a fairly high data production rate. In this paper, we proposemore »« less
https://doi.org/10.1145/3332466.3374525
Understanding and Modeling Lossy Compression Schemes on HPC Scientific Data

Conference Lu, Tao ; Liu, Qing Gary ; He, Xubin ; ...

Scientific simulations generate large amounts of floating-point data, which are often not very compressible using the traditional reduction schemes, such as deduplication or lossless compression. The emergence of lossy floating-point compression holds promise to satisfy the data reduction demand from HPC applications; however, lossy compression has not been widely adopted in science production. We believe a fundamental reason is that there is a lack of understanding of the benefits, pitfalls, and performance of lossy compression on scientific data. In this paper, we conduct a comprehensive study on state-of-the-art lossy compression, including ZFP, SZ, and ISABELA, using real and representative HPCmore »« less
https://doi.org/10.1109/IPDPS.2018.00044

Full Text Available
Ultrafast Error-bounded Lossy Compression for Scientific Datasets

Conference Yu, Xiaodong ; Di, Sheng ; Zhao, Kai ; ...

Today's scientific high-performance computing applications and advanced instruments are producing vast volumes of data across a wide range of domains, which impose a serious burden on data transfer and storage. Error-bounded lossy compression has been developed and widely used in the scientific community because it not only can significantly reduce the data volumes but also can strictly control the data distortion based on the user-specified error bound. Existing lossy compressors, however, cannot offer ultrafast compression speed, which is highly demanded by numerous applications or use cases (such as in-memory compression and online instrument data compression). In this paper, we proposemore »« less
https://doi.org/10.1145/3502181.3531473
ISABELA for effective in situ compression of scientific data: ISABELA FOR EFFECTIVE IN-SITU REDUCTION OF SPATIO-TEMPORAL DATA

Journal Article Lakshminarasimhan, Sriram ; Shah, Neil ; Ethier, Stephane ; ... - Concurrency and Computation. Practice and Experience

Exploding dataset sizes from extreme-scale scientific simulations necessitates efficient data management and reduction schemes to mitigate I/O costs. With the discrepancy between I/O bandwidth and computational power, scientists are forced to capture data infrequently, thereby making data collection an inherently lossy process. Although data compression can be an effective solution, the random nature of real-valued scientific datasets renders lossless compression routines ineffective. These techniques also impose significant overhead during decompression, making them unsuitable for data analysis and visualization, which require repeated data access. To address this problem, we propose an effective method for In situ Sort-And-B-spline Error-bounded Lossy Abatement (ISABELA)more »« less
https://doi.org/10.1002/cpe.2887

Similar Records