skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Scalable tensor factorizations with missing data.

Conference ·
OSTI ID:1017019
 [1]; ;  [2];
  1. Technical University of Denmark
  2. Turkish National Research Institute of Electronics and Cryptology

The problem of missing data is ubiquitous in domains such as biomedical signal processing, network traffic analysis, bibliometrics, social network analysis, chemometrics, computer vision, and communication networks|all domains in which data collection is subject to occasional errors. Moreover, these data sets can be quite large and have more than two axes of variation, e.g., sender, receiver, time. Many applications in those domains aim to capture the underlying latent structure of the data; in other words, they need to factorize data sets with missing entries. If we cannot address the problem of missing data, many important data sets will be discarded or improperly analyzed. Therefore, we need a robust and scalable approach for factorizing multi-way arrays (i.e., tensors) in the presence of missing data. We focus on one of the most well-known tensor factorizations, CANDECOMP/PARAFAC (CP), and formulate the CP model as a weighted least squares problem that models only the known entries. We develop an algorithm called CP-WOPT (CP Weighted OPTimization) using a first-order optimization approach to solve the weighted least squares problem. Based on extensive numerical experiments, our algorithm is shown to successfully factor tensors with noise and up to 70% missing data. Moreover, our approach is significantly faster than the leading alternative and scales to larger problems. To show the real-world usefulness of CP-WOPT, we illustrate its applicability on a novel EEG (electroencephalogram) application where missing data is frequently encountered due to disconnections of electrodes.

Research Organization:
Sandia National Laboratories (SNL), Albuquerque, NM, and Livermore, CA (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
AC04-94AL85000
OSTI ID:
1017019
Report Number(s):
SAND2010-2749C; TRN: US201113%%262
Resource Relation:
Conference: Proposed for presentation at the SIAM Conference on Data Mining held April 29-May 1, 2010 in Columbus, OH.
Country of Publication:
United States
Language:
English