skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Novel Data Reduction Based on Statistical Similarity

Abstract

Applications such as scientific simulations and power grid monitoring are generating so much data quickly that compression is essential to reduce storage requirement or transmission capacity. To achieve better compression, one is often willing to discard some repeated information. These lossy compression methods are primarily designed to minimize the Euclidean distance between the original data and the compressed data. But this measure of distance severely limits either reconstruction quality or compression performance. In this paper, we propose a new class of compression method by redefining the distance measure with a statistical concept known as exchangeability. This approach reduces the storage requirement and captures essential features, while reducing the storage requirement. In this paper, we report our design and implementation of such a compression method named IDEALEM. To demonstrate its effectiveness, we apply it on a set of power grid monitoring data, and show that it can reduce the volume of data much more than the best known compression method while maintaining the quality of the compressed data. Finally, in these tests, IDEALEM captures extraordinary events in the data, while its compression ratios can far exceed 100.

Authors:
 [1];  [1];  [2];  [1]
  1. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
  2. Ulsan National Inst. of Science and Technology (UNIST) (Korea, Republic of)
Publication Date:
Research Org.:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Ulsan National Inst. of Science and Technology (UNIST) (Korea, Republic of)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21); National Research Foundation of Korea (NRF)
OSTI Identifier:
1379521
Grant/Contract Number:  
AC02-05CH11231; NRF-2014R1A1A1002662; NRF-2014M2A8A2074096
Resource Type:
Accepted Manuscript
Journal Name:
International Conference on Scientific and Statistical Database Management (SSDBM)
Additional Journal Information:
Journal Name: International Conference on Scientific and Statistical Database Management (SSDBM); Journal Volume: 2016; Conference: 28. International Conference on Scientific and Statistical Database Management, Budapest (Hungary), 18-20-July-2016; Journal ID: ISSN 1551-6393
Publisher:
ACM - IEEE
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; floating-point data; locally exchangeable measure; lossy compression; online algorithm; time series data

Citation Formats

Lee, Dongeun, Sim, Alex, Choi, Jaesik, and Wu, Kesheng. Novel Data Reduction Based on Statistical Similarity. United States: N. p., 2016. Web. doi:10.1145/2949689.2949708.
Lee, Dongeun, Sim, Alex, Choi, Jaesik, & Wu, Kesheng. Novel Data Reduction Based on Statistical Similarity. United States. doi:10.1145/2949689.2949708.
Lee, Dongeun, Sim, Alex, Choi, Jaesik, and Wu, Kesheng. Mon . "Novel Data Reduction Based on Statistical Similarity". United States. doi:10.1145/2949689.2949708. https://www.osti.gov/servlets/purl/1379521.
@article{osti_1379521,
title = {Novel Data Reduction Based on Statistical Similarity},
author = {Lee, Dongeun and Sim, Alex and Choi, Jaesik and Wu, Kesheng},
abstractNote = {Applications such as scientific simulations and power grid monitoring are generating so much data quickly that compression is essential to reduce storage requirement or transmission capacity. To achieve better compression, one is often willing to discard some repeated information. These lossy compression methods are primarily designed to minimize the Euclidean distance between the original data and the compressed data. But this measure of distance severely limits either reconstruction quality or compression performance. In this paper, we propose a new class of compression method by redefining the distance measure with a statistical concept known as exchangeability. This approach reduces the storage requirement and captures essential features, while reducing the storage requirement. In this paper, we report our design and implementation of such a compression method named IDEALEM. To demonstrate its effectiveness, we apply it on a set of power grid monitoring data, and show that it can reduce the volume of data much more than the best known compression method while maintaining the quality of the compressed data. Finally, in these tests, IDEALEM captures extraordinary events in the data, while its compression ratios can far exceed 100.},
doi = {10.1145/2949689.2949708},
journal = {International Conference on Scientific and Statistical Database Management (SSDBM)},
number = ,
volume = 2016,
place = {United States},
year = {2016},
month = {7}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Save / Share:

Works referenced in this record:

Approximate storage in solid-state memories
conference, January 2013

  • Sampson, Adrian; Nelson, Jacob; Strauss, Karin
  • Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture - MICRO-46
  • DOI: 10.1145/2540708.2540712

Fixed-Rate Compressed Floating-Point Arrays
journal, December 2014

  • Lindstrom, Peter
  • IEEE Transactions on Visualization and Computer Graphics, Vol. 20, Issue 12
  • DOI: 10.1109/TVCG.2014.2346458

FPC: A High-Speed Compressor for Double-Precision Floating-Point Data
journal, January 2009

  • Burtscher, Martin; Ratanaworabhan, Paruj
  • IEEE Transactions on Computers, Vol. 58, Issue 1
  • DOI: 10.1109/TC.2008.131

The Kolmogorov-Smirnov Test for Goodness of Fit
journal, March 1951


The ASA Statement on p -Values: Context, Process, and Purpose
journal, April 2016


A Technique for High-Performance Data Compression
journal, June 1984


A universal algorithm for sequential data compression
journal, May 1977