# Optimization of Error-Bounded Lossy Compression for Hard-to-Compress HPC Data

## Abstract

Since today’s scientific applications are producing vast amounts of data, compressing them before storage/transmission is critical. Results of existing compressors show two types of HPC data sets: highly compressible and hard to compress. In this work, we carefully design and optimize the error-bounded lossy compression for hard-tocompress scientific data. We propose an optimized algorithm that can adaptively partition the HPC data into best-fit consecutive segments each having mutually close data values, such that the compression condition can be optimized. Another significant contribution is the optimization of shifting offset such that the XOR-leading-zero length between two consecutive unpredictable data points can be maximized. We finally devise an adaptive method to select the best-fit compressor at runtime for maximizing the compression factor. We evaluate our solution using 13 benchmarks based on real-world scientific problems, and we compare it with 9 other state-of-the-art compressors. Experiments show that our compressor can always guarantee the compression errors within the user-specified error bounds. Most importantly, our optimization can improve the compression factor effectively, by up to 49% for hard-tocompress data sets with similar compression/decompression time cost.

- Authors:

- Publication Date:

- Research Org.:
- Argonne National Lab. (ANL), Argonne, IL (United States)

- Sponsoring Org.:
- USDOE National Nuclear Security Administration (NNSA)

- OSTI Identifier:
- 1417025

- DOE Contract Number:
- AC02-06CH11357

- Resource Type:
- Journal Article

- Resource Relation:
- Journal Name: IEEE Transactions on Parallel and Distributed Systems; Journal Volume: 29; Journal Issue: 1

- Country of Publication:
- United States

- Language:
- English

- Subject:
- Error-bounded lossy compression; floating-point data compression; high performance computing; scientific simulation

### Citation Formats

```
Di, Sheng, and Cappello, Franck.
```*Optimization of Error-Bounded Lossy Compression for Hard-to-Compress HPC Data*. United States: N. p., 2018.
Web. doi:10.1109/TPDS.2017.2749300.

```
Di, Sheng, & Cappello, Franck.
```*Optimization of Error-Bounded Lossy Compression for Hard-to-Compress HPC Data*. United States. doi:10.1109/TPDS.2017.2749300.

```
Di, Sheng, and Cappello, Franck. Mon .
"Optimization of Error-Bounded Lossy Compression for Hard-to-Compress HPC Data". United States.
doi:10.1109/TPDS.2017.2749300.
```

```
@article{osti_1417025,
```

title = {Optimization of Error-Bounded Lossy Compression for Hard-to-Compress HPC Data},

author = {Di, Sheng and Cappello, Franck},

abstractNote = {Since today’s scientific applications are producing vast amounts of data, compressing them before storage/transmission is critical. Results of existing compressors show two types of HPC data sets: highly compressible and hard to compress. In this work, we carefully design and optimize the error-bounded lossy compression for hard-tocompress scientific data. We propose an optimized algorithm that can adaptively partition the HPC data into best-fit consecutive segments each having mutually close data values, such that the compression condition can be optimized. Another significant contribution is the optimization of shifting offset such that the XOR-leading-zero length between two consecutive unpredictable data points can be maximized. We finally devise an adaptive method to select the best-fit compressor at runtime for maximizing the compression factor. We evaluate our solution using 13 benchmarks based on real-world scientific problems, and we compare it with 9 other state-of-the-art compressors. Experiments show that our compressor can always guarantee the compression errors within the user-specified error bounds. Most importantly, our optimization can improve the compression factor effectively, by up to 49% for hard-tocompress data sets with similar compression/decompression time cost.},

doi = {10.1109/TPDS.2017.2749300},

journal = {IEEE Transactions on Parallel and Distributed Systems},

number = 1,

volume = 29,

place = {United States},

year = {Mon Jan 01 00:00:00 EST 2018},

month = {Mon Jan 01 00:00:00 EST 2018}

}