skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: A high-performance parallel algorithm for nonnegative matrix factorization

Abstract

Non-negative matrix factorization (NMF) is the problem of determining two non-negative low rank factors W and H, for the given input matrix A, such that A ≈ WH. NMF is a useful tool for many applications in different domains such as topic modeling in text mining, background separation in video analysis, and community detection in social networks. Despite its popularity in the data mining community, there is a lack of efficient distributed algorithms to solve the problem for big data sets. We propose a high-performance distributed-memory parallel algorithm that computes the factorization by iteratively solving alternating non-negative least squares (NLS) subproblems for W and H. It maintains the data and factor matrices in memory (distributed across processors), uses MPI for interprocessor communication, and, in the dense case, provably minimizes communication costs (under mild assumptions). As opposed to previous implementations, our algorithm is also flexible: (1) it performs well for both dense and sparse matrices, and (2) it allows the user to choose any one of the multiple algorithms for solving the updates to low rank factors W and H within the alternating iterations. We demonstrate the scalability of our algorithm and compare it with baseline implementations, showing significant performance improvements.

Authors:
 [1];  [2];  [1]
  1. Georgia Tech
  2. Sandia National Laboratories
Publication Date:
Research Org.:
Lawrence Berkeley National Laboratory-National Energy Research Scientific Computing Center
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
1524064
DOE Contract Number:  
AC04- 94AL85000; AC02-05CH11231
Resource Type:
Journal Article
Country of Publication:
United States
Language:
English

Citation Formats

Kannan, Ramakrishnan, Ballard, Grey, and Park, Haesun. A high-performance parallel algorithm for nonnegative matrix factorization. United States: N. p., 2016. Web. doi:10.1145/2851141.2851152.
Kannan, Ramakrishnan, Ballard, Grey, & Park, Haesun. A high-performance parallel algorithm for nonnegative matrix factorization. United States. doi:10.1145/2851141.2851152.
Kannan, Ramakrishnan, Ballard, Grey, and Park, Haesun. Fri . "A high-performance parallel algorithm for nonnegative matrix factorization". United States. doi:10.1145/2851141.2851152.
@article{osti_1524064,
title = {A high-performance parallel algorithm for nonnegative matrix factorization},
author = {Kannan, Ramakrishnan and Ballard, Grey and Park, Haesun},
abstractNote = {Non-negative matrix factorization (NMF) is the problem of determining two non-negative low rank factors W and H, for the given input matrix A, such that A ≈ WH. NMF is a useful tool for many applications in different domains such as topic modeling in text mining, background separation in video analysis, and community detection in social networks. Despite its popularity in the data mining community, there is a lack of efficient distributed algorithms to solve the problem for big data sets. We propose a high-performance distributed-memory parallel algorithm that computes the factorization by iteratively solving alternating non-negative least squares (NLS) subproblems for W and H. It maintains the data and factor matrices in memory (distributed across processors), uses MPI for interprocessor communication, and, in the dense case, provably minimizes communication costs (under mild assumptions). As opposed to previous implementations, our algorithm is also flexible: (1) it performs well for both dense and sparse matrices, and (2) it allows the user to choose any one of the multiple algorithms for solving the updates to low rank factors W and H within the alternating iterations. We demonstrate the scalability of our algorithm and compare it with baseline implementations, showing significant performance improvements.},
doi = {10.1145/2851141.2851152},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2016},
month = {1}
}

Works referenced in this record:

CloudNMF: A MapReduce Implementation of Nonnegative Matrix Factorization for Large-scale Biological Datasets
journal, February 2014

  • Liao, Ruiqi; Zhang, Yifan; Guan, Jihong
  • Genomics, Proteomics & Bioinformatics, Vol. 12, Issue 1
  • DOI: 10.1016/j.gpb.2013.06.001

Distributed nonnegative matrix factorization for web-scale dyadic data analysis on mapreduce
conference, January 2010

  • Liu, Chao; Yang, Hung-chih; Fan, Jinliang
  • Proceedings of the 19th international conference on World wide web - WWW '10
  • DOI: 10.1145/1772690.1772760

SymNMF: nonnegative low-rank approximation of a similarity matrix for graph clustering
journal, November 2014


Brief Announcement: Hypergraph Partitioning for Parallel Sparse Matrix-Matrix Multiplication
conference, January 2015

  • Ballard, Grey; Druinsky, Alex; Knight, Nicholas
  • Proceedings of the 27th ACM on Symposium on Parallelism in Algorithms and Architectures - SPAA '15
  • DOI: 10.1145/2755573.2755613

NMF-mGPU: non-negative matrix factorization on multi-GPU systems
journal, February 2015

  • Mejía-Roa, Edgardo; Tabas-Madrid, Daniel; Setoain, Javier
  • BMC Bioinformatics, Vol. 16, Issue 1
  • DOI: 10.1186/s12859-015-0485-4

Fast Nonnegative Matrix Factorization: An Active-Set-Like Method and Comparisons
journal, January 2011

  • Kim, Jingu; Park, Haesun
  • SIAM Journal on Scientific Computing, Vol. 33, Issue 6
  • DOI: 10.1137/110821172

Algorithms for nonnegative matrix and tensor factorizations: a unified view based on block coordinate descent framework
journal, March 2013


Communication-Optimal Parallel Recursive Rectangular Matrix Multiplication
conference, May 2013

  • Demmel, James; Eliahu, David; Fox, Armando
  • 2013 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on Parallel and Distributed Processing
  • DOI: 10.1109/IPDPS.2013.80

Large-scale matrix factorization with distributed stochastic gradient descent
conference, January 2011

  • Gemulla, Rainer; Nijkamp, Erik; Haas, Peter J.
  • Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '11
  • DOI: 10.1145/2020408.2020426

Symmetric Nonnegative Matrix Factorization for Graph Clustering
conference, December 2013

  • Kuang, Da; Ding, Chris; Park, Haesun
  • Proceedings of the 2012 SIAM International Conference on Data Mining
  • DOI: 10.1137/1.9781611972825.10

Collective communication: theory, practice, and experience
journal, January 2007

  • Chan, Ernie; Heimlich, Marcel; Purkayastha, Avi
  • Concurrency and Computation: Practice and Experience, Vol. 19, Issue 13
  • DOI: 10.1002/cpe.1206

Behavioral clusters in dynamic graphs
journal, August 2015


A Dynamic Data Driven Application System for Vehicle Tracking
journal, January 2014


Nonnegative Matrix Factorization: A Comprehensive Review
journal, June 2013

  • Wang, Yu-Xiong; Zhang, Yu-Jin
  • IEEE Transactions on Knowledge and Data Engineering, Vol. 25, Issue 6
  • DOI: 10.1109/TKDE.2012.51