DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: A dictionary learning algorithm for compression and reconstruction of streaming data in preset order

Abstract

There has been an emerging interest in developing and applying dictionary learning (DL) to process massive datasets in the last decade. Many of these efforts, however, focus on employing DL to compress and extract a set of important features from data, while considering restoring the original data from this set a secondary goal. On the other hand, although several methods are able to process streaming data by updating the dictionary incrementally as new snapshots pass by, most of those algorithms are designed for the setting where the snapshots are randomly drawn from a probability distribution. In this paper, we present a new DL approach to compress and denoise massive dataset in real time, in which the data are streamed through in a preset order (instances are videos and temporal experimental data), so at any time, we can only observe a biased sample set of the whole data. Here, our approach incrementally builds up the dictionary in a relatively simple manner: if the new snapshot is adequately explained by the current dictionary, we perform a sparse coding to find its sparse representation; otherwise, we add the new snapshot to the dictionary, with a Gram-Schmidt process to maintain the orthogonality. To compressmore » and denoise noisy datasets, we apply the denoising to the snapshot directly before sparse coding, which deviates from traditional dictionary learning approach that achieves denoising via sparse coding. Compared to full-batch matrix decomposition methods, where the whole data is kept in memory, and other mini-batch approaches, where unbiased sampling is often assumed, our approach has minimal requirement in data sampling and storage: i) each snapshot is only seen once then discarded, and ii) the snapshots are drawn in a preset order, so can be highly biased. Through experiments on climate simulations and scanning transmission electron microscopy (STEM) data, we demonstrate that the proposed approach performs competitively to those methods in data reconstruction and denoising.« less

Authors:
ORCiD logo [1]; ORCiD logo [1]
  1. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR). Scientific Discovery through Advanced Computing (SciDAC); USDOE Laboratory Directed Research and Development (LDRD) Program
OSTI Identifier:
1883981
Grant/Contract Number:  
AC05-00OR22725; AC02-05CH11231
Resource Type:
Accepted Manuscript
Journal Name:
Discrete and Continuous Dynamical Systems - Series S
Additional Journal Information:
Journal Volume: 15; Journal Issue: 4; Journal ID: ISSN 1937-1632
Publisher:
American Institute of Mathematical Sciences (AIMS)
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; dictionary learning; matrix factorization; online algorithm

Citation Formats

Archibald, Richard, and Tran, Hoang. A dictionary learning algorithm for compression and reconstruction of streaming data in preset order. United States: N. p., 2022. Web. doi:10.3934/dcdss.2021102.
Archibald, Richard, & Tran, Hoang. A dictionary learning algorithm for compression and reconstruction of streaming data in preset order. United States. https://doi.org/10.3934/dcdss.2021102
Archibald, Richard, and Tran, Hoang. Fri . "A dictionary learning algorithm for compression and reconstruction of streaming data in preset order". United States. https://doi.org/10.3934/dcdss.2021102. https://www.osti.gov/servlets/purl/1883981.
@article{osti_1883981,
title = {A dictionary learning algorithm for compression and reconstruction of streaming data in preset order},
author = {Archibald, Richard and Tran, Hoang},
abstractNote = {There has been an emerging interest in developing and applying dictionary learning (DL) to process massive datasets in the last decade. Many of these efforts, however, focus on employing DL to compress and extract a set of important features from data, while considering restoring the original data from this set a secondary goal. On the other hand, although several methods are able to process streaming data by updating the dictionary incrementally as new snapshots pass by, most of those algorithms are designed for the setting where the snapshots are randomly drawn from a probability distribution. In this paper, we present a new DL approach to compress and denoise massive dataset in real time, in which the data are streamed through in a preset order (instances are videos and temporal experimental data), so at any time, we can only observe a biased sample set of the whole data. Here, our approach incrementally builds up the dictionary in a relatively simple manner: if the new snapshot is adequately explained by the current dictionary, we perform a sparse coding to find its sparse representation; otherwise, we add the new snapshot to the dictionary, with a Gram-Schmidt process to maintain the orthogonality. To compress and denoise noisy datasets, we apply the denoising to the snapshot directly before sparse coding, which deviates from traditional dictionary learning approach that achieves denoising via sparse coding. Compared to full-batch matrix decomposition methods, where the whole data is kept in memory, and other mini-batch approaches, where unbiased sampling is often assumed, our approach has minimal requirement in data sampling and storage: i) each snapshot is only seen once then discarded, and ii) the snapshots are drawn in a preset order, so can be highly biased. Through experiments on climate simulations and scanning transmission electron microscopy (STEM) data, we demonstrate that the proposed approach performs competitively to those methods in data reconstruction and denoising.},
doi = {10.3934/dcdss.2021102},
journal = {Discrete and Continuous Dynamical Systems - Series S},
number = 4,
volume = 15,
place = {United States},
year = {Fri Apr 01 00:00:00 EDT 2022},
month = {Fri Apr 01 00:00:00 EDT 2022}
}

Works referenced in this record:

Sparse coding with an overcomplete basis set: A strategy employed by V1?
journal, December 1997


Fast Low-Rank Shared Dictionary Learning for Image Classification
journal, November 2017


Online group-structured dictionary learning
conference, June 2011


Identifying Novel Polar Distortion Modes in Engineered Magnetic Oxide Superlattices
journal, July 2017

  • Ghosh, Saurabh; Choquette, Amber; May, Steve
  • Microscopy and Microanalysis, Vol. 23, Issue S1
  • DOI: 10.1017/S1431927617008613

An initial-value problem for testing numerical models of the global shallow-water equations
journal, January 2004

  • Galewsky, Joseph; Scott, Richard K.; Polvani, Lorenzo M.
  • Tellus A: Dynamic Meteorology and Oceanography, Vol. 56, Issue 5
  • DOI: 10.3402/tellusa.v56i5.14436

Dictionary Learning
journal, March 2011


Online Robust Dictionary Learning
conference, June 2013

  • Lu, Cewu; Shi, Jiaping; Jia, Jiaya
  • 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • DOI: 10.1109/CVPR.2013.60

A fast patch-dictionary method for whole image recovery
journal, May 2016

  • Xu, Yangyang; Yin, Wotao
  • Inverse Problems and Imaging, Vol. 10, Issue 2
  • DOI: 10.3934/ipi.2016012

Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries
journal, January 2006

  • Elad, Michael; Aharon, Michal
  • IEEE Transactions on Image Processing, Vol. 15, Issue 12
  • DOI: 10.1109/TIP.2006.881969

Online dictionary learning from big data using accelerated stochastic approximation algorithms
conference, May 2014

  • Slavakis, Konstantinos; Giannakis, Georgios B.
  • 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • DOI: 10.1109/ICASSP.2014.6853549

Rudin-Osher-Fatemi Total Variation Denoising using Split Bregman
journal, January 2012


First- and Second-Order Methods for Online Convolutional Dictionary Learning
journal, January 2018

  • Liu, Jialin; Garcia-Cardona, Cristina; Wohlberg, Brendt
  • SIAM Journal on Imaging Sciences, Vol. 11, Issue 2
  • DOI: 10.1137/17M1145689

A Discontinuous Galerkin Transport Scheme on the Cubed Sphere
journal, April 2005

  • Nair, Ramachandran D.; Thomas, Stephen J.; Loft, Richard D.
  • Monthly Weather Review, Vol. 133, Issue 4
  • DOI: 10.1175/MWR2890.1

$rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation
journal, November 2006

  • Aharon, M.; Elad, M.; Bruckstein, A.
  • IEEE Transactions on Signal Processing, Vol. 54, Issue 11
  • DOI: 10.1109/TSP.2006.881199

Sparse and Redundant Modeling of Image Content Using an Image-Signature-Dictionary
journal, January 2008

  • Aharon, Michal; Elad, Michael
  • SIAM Journal on Imaging Sciences, Vol. 1, Issue 3
  • DOI: 10.1137/07070156X

Dictionaries for Sparse Representation Modeling
journal, June 2010


Incremental Learning for Robust Visual Tracking
journal, August 2007

  • Ross, David A.; Lim, Jongwoo; Lin, Ruei-Sung
  • International Journal of Computer Vision, Vol. 77, Issue 1-3
  • DOI: 10.1007/s11263-007-0075-7

Task-Driven Dictionary Learning
journal, April 2012

  • Mairal, J.; Bach, F.; Ponce, J.
  • IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 34, Issue 4, p. 791-804
  • DOI: 10.1109/TPAMI.2011.156

An Algorithm for Total Variation Minimization and Applications
journal, January 2004


Online dictionary learning for sparse coding
conference, January 2009

  • Mairal, Julien; Bach, Francis; Ponce, Jean
  • Proceedings of the 26th Annual International Conference on Machine Learning - ICML '09
  • DOI: 10.1145/1553374.1553463

Online convolutional dictionary learning for multimodal imaging
conference, September 2017

  • Degraux, Kevin; Kamilov, Ulugbek S.; Boufounos, Petros T.
  • 2017 IEEE International Conference on Image Processing (ICIP)
  • DOI: 10.1109/ICIP.2017.8296555