DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: A dictionary learning algorithm for compression and reconstruction of streaming data in preset order

Journal Article · · Discrete and Continuous Dynamical Systems - Series S

There has been an emerging interest in developing and applying dictionary learning (DL) to process massive datasets in the last decade. Many of these efforts, however, focus on employing DL to compress and extract a set of important features from data, while considering restoring the original data from this set a secondary goal. On the other hand, although several methods are able to process streaming data by updating the dictionary incrementally as new snapshots pass by, most of those algorithms are designed for the setting where the snapshots are randomly drawn from a probability distribution. In this paper, we present a new DL approach to compress and denoise massive dataset in real time, in which the data are streamed through in a preset order (instances are videos and temporal experimental data), so at any time, we can only observe a biased sample set of the whole data. Here, our approach incrementally builds up the dictionary in a relatively simple manner: if the new snapshot is adequately explained by the current dictionary, we perform a sparse coding to find its sparse representation; otherwise, we add the new snapshot to the dictionary, with a Gram-Schmidt process to maintain the orthogonality. To compress and denoise noisy datasets, we apply the denoising to the snapshot directly before sparse coding, which deviates from traditional dictionary learning approach that achieves denoising via sparse coding. Compared to full-batch matrix decomposition methods, where the whole data is kept in memory, and other mini-batch approaches, where unbiased sampling is often assumed, our approach has minimal requirement in data sampling and storage: i) each snapshot is only seen once then discarded, and ii) the snapshots are drawn in a preset order, so can be highly biased. Through experiments on climate simulations and scanning transmission electron microscopy (STEM) data, we demonstrate that the proposed approach performs competitively to those methods in data reconstruction and denoising.

Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR). Scientific Discovery through Advanced Computing (SciDAC); USDOE Laboratory Directed Research and Development (LDRD) Program
Grant/Contract Number:
AC05-00OR22725; AC02-05CH11231
OSTI ID:
1883981
Journal Information:
Discrete and Continuous Dynamical Systems - Series S, Journal Name: Discrete and Continuous Dynamical Systems - Series S Journal Issue: 4 Vol. 15; ISSN 1937-1632
Publisher:
American Institute of Mathematical Sciences (AIMS)Copyright Statement
Country of Publication:
United States
Language:
English

References (21)

Dictionary Learning journal March 2011
Dictionaries for Sparse Representation Modeling journal June 2010
$rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation journal November 2006
Fast Low-Rank Shared Dictionary Learning for Image Classification journal November 2017
An initial-value problem for testing numerical models of the global shallow-water equations journal January 2004
Sparse coding with an overcomplete basis set: A strategy employed by V1? journal December 1997
Online group-structured dictionary learning conference June 2011
Online Robust Dictionary Learning conference June 2013
A Discontinuous Galerkin Transport Scheme on the Cubed Sphere journal April 2005
Incremental Learning for Robust Visual Tracking journal August 2007
A fast patch-dictionary method for whole image recovery journal May 2016
Online convolutional dictionary learning for multimodal imaging conference September 2017
Sparse and Redundant Modeling of Image Content Using an Image-Signature-Dictionary journal January 2008
Rudin-Osher-Fatemi Total Variation Denoising using Split Bregman journal January 2012
Identifying Novel Polar Distortion Modes in Engineered Magnetic Oxide Superlattices journal July 2017
First- and Second-Order Methods for Online Convolutional Dictionary Learning journal January 2018
Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries journal January 2006
Task-Driven Dictionary Learning journal April 2012
Online dictionary learning from big data using accelerated stochastic approximation algorithms conference May 2014
An Algorithm for Total Variation Minimization and Applications journal January 2004
Online dictionary learning for sparse coding conference January 2009