skip to main content

DOE PAGESDOE PAGES

Title: Novel Data Reduction Based on Statistical Similarity

Applications such as scientific simulations and power grid monitoring are generating so much data quickly that compression is essential to reduce storage requirement or transmission capacity. To achieve better compression, one is often willing to discard some repeated information. These lossy compression methods are primarily designed to minimize the Euclidean distance between the original data and the compressed data. But this measure of distance severely limits either reconstruction quality or compression performance. In this paper, we propose a new class of compression method by redefining the distance measure with a statistical concept known as exchangeability. This approach reduces the storage requirement and captures essential features, while reducing the storage requirement. In this paper, we report our design and implementation of such a compression method named IDEALEM. To demonstrate its effectiveness, we apply it on a set of power grid monitoring data, and show that it can reduce the volume of data much more than the best known compression method while maintaining the quality of the compressed data. Finally, in these tests, IDEALEM captures extraordinary events in the data, while its compression ratios can far exceed 100.
Authors:
 [1] ;  [1] ;  [2] ;  [1]
  1. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
  2. Ulsan National Inst. of Science and Technology (UNIST) (Korea, Republic of)
Publication Date:
Grant/Contract Number:
AC02-05CH11231; NRF-2014R1A1A1002662; NRF-2014M2A8A2074096
Type:
Accepted Manuscript
Journal Name:
International Conference on Scientific and Statistical Database Management (SSDBM)
Additional Journal Information:
Journal Name: International Conference on Scientific and Statistical Database Management (SSDBM); Journal Volume: 2016; Conference: 28. International Conference on Scientific and Statistical Database Management, Budapest (Hungary), 18-20-July-2016; Journal ID: ISSN 1551-6393
Publisher:
ACM - IEEE
Research Org:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Ulsan National Inst. of Science and Technology (UNIST) (Korea, Republic of)
Sponsoring Org:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21); National Research Foundation of Korea (NRF)
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; floating-point data; locally exchangeable measure; lossy compression; online algorithm; time series data
OSTI Identifier:
1379521