A highperformance parallel algorithm for nonnegative matrix factorization
Abstract
Nonnegative matrix factorization (NMF) is the problem of determining two nonnegative low rank factors W and H, for the given input matrix A, such that A ≈ WH. NMF is a useful tool for many applications in different domains such as topic modeling in text mining, background separation in video analysis, and community detection in social networks. Despite its popularity in the data mining community, there is a lack of efficient distributed algorithms to solve the problem for big data sets. We propose a highperformance distributedmemory parallel algorithm that computes the factorization by iteratively solving alternating nonnegative least squares (NLS) subproblems for W and H. It maintains the data and factor matrices in memory (distributed across processors), uses MPI for interprocessor communication, and, in the dense case, provably minimizes communication costs (under mild assumptions). As opposed to previous implementations, our algorithm is also flexible: (1) it performs well for both dense and sparse matrices, and (2) it allows the user to choose any one of the multiple algorithms for solving the updates to low rank factors W and H within the alternating iterations. We demonstrate the scalability of our algorithm and compare it with baseline implementations, showing significant performance improvements.
 Authors:

 Georgia Tech
 Sandia National Laboratories
 Publication Date:
 Research Org.:
 Lawrence Berkeley National LaboratoryNational Energy Research Scientific Computing Center
 Sponsoring Org.:
 USDOE Office of Science (SC)
 OSTI Identifier:
 1524064
 DOE Contract Number:
 AC04 94AL85000; AC0205CH11231
 Resource Type:
 Journal Article
 Country of Publication:
 United States
 Language:
 English
Citation Formats
Kannan, Ramakrishnan, Ballard, Grey, and Park, Haesun. A highperformance parallel algorithm for nonnegative matrix factorization. United States: N. p., 2016.
Web. doi:10.1145/2851141.2851152.
Kannan, Ramakrishnan, Ballard, Grey, & Park, Haesun. A highperformance parallel algorithm for nonnegative matrix factorization. United States. doi:10.1145/2851141.2851152.
Kannan, Ramakrishnan, Ballard, Grey, and Park, Haesun. Fri .
"A highperformance parallel algorithm for nonnegative matrix factorization". United States. doi:10.1145/2851141.2851152.
@article{osti_1524064,
title = {A highperformance parallel algorithm for nonnegative matrix factorization},
author = {Kannan, Ramakrishnan and Ballard, Grey and Park, Haesun},
abstractNote = {Nonnegative matrix factorization (NMF) is the problem of determining two nonnegative low rank factors W and H, for the given input matrix A, such that A ≈ WH. NMF is a useful tool for many applications in different domains such as topic modeling in text mining, background separation in video analysis, and community detection in social networks. Despite its popularity in the data mining community, there is a lack of efficient distributed algorithms to solve the problem for big data sets. We propose a highperformance distributedmemory parallel algorithm that computes the factorization by iteratively solving alternating nonnegative least squares (NLS) subproblems for W and H. It maintains the data and factor matrices in memory (distributed across processors), uses MPI for interprocessor communication, and, in the dense case, provably minimizes communication costs (under mild assumptions). As opposed to previous implementations, our algorithm is also flexible: (1) it performs well for both dense and sparse matrices, and (2) it allows the user to choose any one of the multiple algorithms for solving the updates to low rank factors W and H within the alternating iterations. We demonstrate the scalability of our algorithm and compare it with baseline implementations, showing significant performance improvements.},
doi = {10.1145/2851141.2851152},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2016},
month = {1}
}
Works referenced in this record:
CloudNMF: A MapReduce Implementation of Nonnegative Matrix Factorization for Largescale Biological Datasets
journal, February 2014
 Liao, Ruiqi; Zhang, Yifan; Guan, Jihong
 Genomics, Proteomics & Bioinformatics, Vol. 12, Issue 1
Distributed nonnegative matrix factorization for webscale dyadic data analysis on mapreduce
conference, January 2010
 Liu, Chao; Yang, Hungchih; Fan, Jinliang
 Proceedings of the 19th international conference on World wide web  WWW '10
SymNMF: nonnegative lowrank approximation of a similarity matrix for graph clustering
journal, November 2014
 Kuang, Da; Yun, Sangwoon; Park, Haesun
 Journal of Global Optimization, Vol. 62, Issue 3
Brief Announcement: Hypergraph Partitioning for Parallel Sparse MatrixMatrix Multiplication
conference, January 2015
 Ballard, Grey; Druinsky, Alex; Knight, Nicholas
 Proceedings of the 27th ACM on Symposium on Parallelism in Algorithms and Architectures  SPAA '15
NMFmGPU: nonnegative matrix factorization on multiGPU systems
journal, February 2015
 MejíaRoa, Edgardo; TabasMadrid, Daniel; Setoain, Javier
 BMC Bioinformatics, Vol. 16, Issue 1
Fast Nonnegative Matrix Factorization: An ActiveSetLike Method and Comparisons
journal, January 2011
 Kim, Jingu; Park, Haesun
 SIAM Journal on Scientific Computing, Vol. 33, Issue 6
Algorithms for nonnegative matrix and tensor factorizations: a unified view based on block coordinate descent framework
journal, March 2013
 Kim, Jingu; He, Yunlong; Park, Haesun
 Journal of Global Optimization, Vol. 58, Issue 2
CommunicationOptimal Parallel Recursive Rectangular Matrix Multiplication
conference, May 2013
 Demmel, James; Eliahu, David; Fox, Armando
 2013 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on Parallel and Distributed Processing
Largescale matrix factorization with distributed stochastic gradient descent
conference, January 2011
 Gemulla, Rainer; Nijkamp, Erik; Haas, Peter J.
 Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining  KDD '11
Symmetric Nonnegative Matrix Factorization for Graph Clustering
conference, December 2013
 Kuang, Da; Ding, Chris; Park, Haesun
 Proceedings of the 2012 SIAM International Conference on Data Mining
Sparse nonnegative matrix factorizations via alternating nonnegativityconstrained least squares for microarray data analysis
journal, May 2007
 Kim, H.; Park, H.
 Bioinformatics, Vol. 23, Issue 12
Collective communication: theory, practice, and experience
journal, January 2007
 Chan, Ernie; Heimlich, Marcel; Purkayastha, Avi
 Concurrency and Computation: Practice and Experience, Vol. 19, Issue 13
Behavioral clusters in dynamic graphs
journal, August 2015
 Fairbanks, James P.; Kannan, Ramakrishnan; Park, Haesun
 Parallel Computing, Vol. 47
A Dynamic Data Driven Application System for Vehicle Tracking
journal, January 2014
 Fujimoto, Richard; Guin, Angshuman; Hunter, Michael
 Procedia Computer Science, Vol. 29
Nonnegative Matrix Factorization: A Comprehensive Review
journal, June 2013
 Wang, YuXiong; Zhang, YuJin
 IEEE Transactions on Knowledge and Data Engineering, Vol. 25, Issue 6