skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: MPI-FAUN: An MPI-Based Framework for Alternating-Updating Nonnegative Matrix Factorization

Abstract

Non-negative matrix factorization (NMF) is the problem of determining two non-negative low rank factors W and H, for the given input matrix A, such that A≈WH. NMF is a useful tool for many applications in different domains such as topic modeling in text mining, background separation in video analysis, and community detection in social networks. Despite its popularity in the data mining community, there is a lack of efficient parallel algorithms to solve the problem for big data sets. The main contribution of this work is a new, high-performance parallel computational framework for a broad class of NMF algorithms that iteratively solves alternating non-negative least squares (NLS) subproblems for W and H. It maintains the data and factor matrices in memory (distributed across processors), uses MPI for interprocessor communication, and, in the dense case, provably minimizes communication costs (under mild assumptions). The framework is flexible and able to leverage a variety of NMF and NLS algorithms, including Multiplicative Update, Hierarchical Alternating Least Squares, and Block Principal Pivoting. Our implementation allows us to benchmark and compare different algorithms on massive dense and sparse data matrices of size that spans from few hundreds of millions to billions. We demonstrate the scalability ofmore » our algorithm and compare it with baseline implementations, showing significant performance improvements. The code and the datasets used for conducting the experiments are available online.« less

Authors:
ORCiD logo [1]; ORCiD logo [2];  [3]
  1. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
  2. Wake Forest Univ., Winston-Salem, NC (United States)
  3. Georgia Inst. of Technology, Atlanta, GA (United States)
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)
Sponsoring Org.:
USDOE Office of Science (SC); National Science Foundation (NSF); US Air Force Office of Scientific Research (AFOSR)
OSTI Identifier:
1429224
Grant/Contract Number:  
AC05-00OR22725; IIS-1348152; ACI-1338745; ACI-1642385; FA8750-12-2-0309; FA9550-13-1-0100
Resource Type:
Accepted Manuscript
Journal Name:
IEEE Transactions on Knowledge and Data Engineering
Additional Journal Information:
Journal Volume: 30; Journal Issue: 3; Journal ID: ISSN 1041-4347
Publisher:
IEEE
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; Program processors; Computational modeling; Algorithm design and analysis; Sparse matrices; Approximation algorithms; Collaboration; Analytical models; HPC; NMF; MPI; 2D

Citation Formats

Kannan, Ramakrishnan, Ballard, Grey, and Park, Haesun. MPI-FAUN: An MPI-Based Framework for Alternating-Updating Nonnegative Matrix Factorization. United States: N. p., 2017. Web. doi:10.1109/TKDE.2017.2767592.
Kannan, Ramakrishnan, Ballard, Grey, & Park, Haesun. MPI-FAUN: An MPI-Based Framework for Alternating-Updating Nonnegative Matrix Factorization. United States. doi:https://doi.org/10.1109/TKDE.2017.2767592
Kannan, Ramakrishnan, Ballard, Grey, and Park, Haesun. Mon . "MPI-FAUN: An MPI-Based Framework for Alternating-Updating Nonnegative Matrix Factorization". United States. doi:https://doi.org/10.1109/TKDE.2017.2767592. https://www.osti.gov/servlets/purl/1429224.
@article{osti_1429224,
title = {MPI-FAUN: An MPI-Based Framework for Alternating-Updating Nonnegative Matrix Factorization},
author = {Kannan, Ramakrishnan and Ballard, Grey and Park, Haesun},
abstractNote = {Non-negative matrix factorization (NMF) is the problem of determining two non-negative low rank factors W and H, for the given input matrix A, such that A≈WH. NMF is a useful tool for many applications in different domains such as topic modeling in text mining, background separation in video analysis, and community detection in social networks. Despite its popularity in the data mining community, there is a lack of efficient parallel algorithms to solve the problem for big data sets. The main contribution of this work is a new, high-performance parallel computational framework for a broad class of NMF algorithms that iteratively solves alternating non-negative least squares (NLS) subproblems for W and H. It maintains the data and factor matrices in memory (distributed across processors), uses MPI for interprocessor communication, and, in the dense case, provably minimizes communication costs (under mild assumptions). The framework is flexible and able to leverage a variety of NMF and NLS algorithms, including Multiplicative Update, Hierarchical Alternating Least Squares, and Block Principal Pivoting. Our implementation allows us to benchmark and compare different algorithms on massive dense and sparse data matrices of size that spans from few hundreds of millions to billions. We demonstrate the scalability of our algorithm and compare it with baseline implementations, showing significant performance improvements. The code and the datasets used for conducting the experiments are available online.},
doi = {10.1109/TKDE.2017.2767592},
journal = {IEEE Transactions on Knowledge and Data Engineering},
number = 3,
volume = 30,
place = {United States},
year = {2017},
month = {10}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 4 works
Citation information provided by
Web of Science

Save / Share:

Works referencing / citing this record:

Optimizing partitioned CSR-based SpGEMM on the Sunway TaihuLight
journal, March 2019

  • Chen, Yuedan; Xiao, Guoqing; Yang, Wangdong
  • Neural Computing and Applications, Vol. 32, Issue 10
  • DOI: 10.1007/s00521-019-04121-z

Scaling sparse matrix-matrix multiplication in the accumulo database
journal, January 2019


SUSTain: Scalable Unsupervised Scoring for Tensors and its Application to Phenotyping
conference, August 2018

  • Perros, Ioakeim; Papalexakis, Evangelos E.; Park, Haesun
  • KDD '18: The 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
  • DOI: 10.1145/3219819.3219999

Deep data analysis via physically constrained linear unmixing: universal framework, domain examples, and a community-wide platform
journal, April 2018

  • Kannan, R.; Ievlev, A. V.; Laanait, N.
  • Advanced Structural and Chemical Imaging, Vol. 4, Issue 1
  • DOI: 10.1186/s40679-018-0055-8