Partitioning and Communication Strategies for Sparse Non-negative Matrix Factorization

Kaya, Oguz; Kannan, Ramakrishnan {ramki}; Ballard, Grey

doi:10.1145/3225058.3225127

Partitioning and Communication Strategies for Sparse Non-negative Matrix Factorization

Conference · Wed Aug 01 04:00:00 EDT 2018

DOI:https://doi.org/10.1145/3225058.3225127· OSTI ID:1470857

Kaya, Oguz ^[1]; ^[2]; Ballard, Grey ^[3]

Inria Bordeaux
ORNL
Wake Forest University, Winston-Salem

Non-negative matrix factorization (NMF), the problem of finding two non-negative low-rank factors whose product approximates an input matrix, is a useful tool for many data mining and scientific applications such as topic modeling in text mining and unmixing in microscopy. In this paper, we focus on scaling algorithms for NMF to very large sparse datasets and massively parallel machines by employing effective algorithms, communication patterns, and partitioning schemes that leverage the sparsity of the input matrix. We consider two previous works developed for related problems, one that uses a fine-grained partitioning strategy using a point-to-point communication pattern and one that uses a Cartesian, or checkerboard, partitioning strategy using a collective-based communication pattern. We show that a combination of the previous approaches balances the demands of the various computations within NMF algorithms and achieves high efficiency and scalability. From the experiments, we see that our proposed strategy runs up to 10x faster than the state of the art on real-world datasets.

View Conference

Research Organization:: Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)

Sponsoring Organization:: USDOE

DOE Contract Number:: AC05-00OR22725

OSTI ID:: 1470857

Country of Publication:: United States

Language:: English

References (26)

Scalable sparse tensor decompositions in distributed memory systems Kaya, Oguz; Uçar, Bora Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '15 https://doi.org/10.1145/2807591.2807624	conference	January 2015
Kernels for scalable data analysis in science: Towards an architecture-portable future Sukumar, Sreenivas R.; Kannan, Ramakrishnan; Lim, Seung-Hwan 2016 IEEE International Conference on Big Data (Big Data) https://doi.org/10.1109/BigData.2016.7840703	conference	December 2016
R-MAT: A Recursive Model for Graph Mining Chakrabarti, Deepayan; Zhan, Yiping; Faloutsos, Christos Proceedings of the 2004 SIAM International Conference on Data Mining https://doi.org/10.1137/1.9781611972740.43	conference	December 2013
F lexi F a CT: Scalable Flexible Factorization of Coupled Tensors on Hadoop Beutel, Alex; Talukdar, Partha Pratim; Kumar, Abhimanu Proceedings of the 2014 SIAM International Conference on Data Mining https://doi.org/10.1137/1.9781611973440.13	conference	April 2014
Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis Kim, H.; Park, H. Bioinformatics, Vol. 23, Issue 12 https://doi.org/10.1093/bioinformatics/btm134	journal	May 2007
Alternating direction method of multipliers for non-negative matrix factorization with the beta-divergence Sun, Dennis L.; Fevotte, Cedric ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) https://doi.org/10.1109/ICASSP.2014.6854796	conference	May 2014
Symmetric Nonnegative Matrix Factorization for Graph Clustering Kuang, Da; Ding, Chris; Park, Haesun Proceedings of the 2012 SIAM International Conference on Data Mining https://doi.org/10.1137/1.9781611972825.10	conference	December 2013
Supporting Array Programming in X10 Grove, David; Milthorpe, Josh; Tardieu, Olivier Proceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming - ARRAY'14 https://doi.org/10.1145/2627373.2627380	conference	January 2014
NMF-mGPU: non-negative matrix factorization on multi-GPU systems Mejía-Roa, Edgardo; Tabas-Madrid, Daniel; Setoain, Javier BMC Bioinformatics, Vol. 16, Issue 1 https://doi.org/10.1186/s12859-015-0485-4	journal	February 2015
Deep data analysis via physically constrained linear unmixing: universal framework, domain examples, and a community-wide platform Kannan, R.; Ievlev, A. V.; Laanait, N. Advanced Structural and Chemical Imaging, Vol. 4, Issue 1 https://doi.org/10.1186/s40679-018-0055-8	journal	April 2018
Mini-apps for high performance data analysis Sukumar, Sreenivas R.; Matheson, Michael A.; Kannan, Ramakrishnan 2016 IEEE International Conference on Big Data (Big Data) https://doi.org/10.1109/BigData.2016.7840756	conference	December 2016
SymNMF: nonnegative low-rank approximation of a similarity matrix for graph clustering Kuang, Da; Yun, Sangwoon; Park, Haesun Journal of Global Optimization, Vol. 62, Issue 3 https://doi.org/10.1007/s10898-014-0247-2	journal	November 2014
Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-Way Data Analysis and Blind Source Separation Cichocki, Andrzej; Zdunek, Rafal; Phan, Anh Huy https://doi.org/10.1002/9780470747278	book	September 2009
Distributed GraphLab: a framework for machine learning and data mining in the cloud Low, Yucheng; Bickson, Danny; Gonzalez, Joseph Proceedings of the VLDB Endowment, Vol. 5, Issue 8 https://doi.org/10.14778/2212351.2212354	journal	April 2012
NOMAD: non-locking, stochastic multi-machine algorithm for asynchronous and decentralized matrix completion Yun, Hyokun; Yu, Hsiang-Fu; Hsieh, Cho-Jui Proceedings of the VLDB Endowment, Vol. 7, Issue 11 https://doi.org/10.14778/2732967.2732973	journal	July 2014
A high-performance parallel algorithm for nonnegative matrix factorization Kannan, Ramakrishnan; Ballard, Grey; Park, Haesun Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming - PPoPP '16 https://doi.org/10.1145/2851141.2851152	conference	January 2016
MPI-FAUN: An MPI-Based Framework for Alternating-Updating Nonnegative Matrix Factorization Kannan, Ramakrishnan; Ballard, Grey; Park, Haesun IEEE Transactions on Knowledge and Data Engineering, Vol. 30, Issue 3 https://doi.org/10.1109/TKDE.2017.2767592	journal	March 2018
Algorithms for nonnegative matrix and tensor factorizations: a unified view based on block coordinate descent framework Kim, Jingu; He, Yunlong; Park, Haesun Journal of Global Optimization, Vol. 58, Issue 2 https://doi.org/10.1007/s10898-013-0035-4	journal	March 2013
CloudNMF: A MapReduce Implementation of Nonnegative Matrix Factorization for Large-scale Biological Datasets Liao, Ruiqi; Zhang, Yifan; Guan, Jihong Genomics, Proteomics & Bioinformatics, Vol. 12, Issue 1 https://doi.org/10.1016/j.gpb.2013.06.001	journal	February 2014
NeNMF: An Optimal Gradient Method for Nonnegative Matrix Factorization Guan, Naiyang; Tao, Dacheng; Luo, Zhigang IEEE Transactions on Signal Processing, Vol. 60, Issue 6 https://doi.org/10.1109/TSP.2012.2190406	journal	June 2012
Multi-level direct K-way hypergraph partitioning with multiple constraints and fixed vertices Aykanat, Cevdet; Cambazoglu, B. Barla; Uçar, Bora Journal of Parallel and Distributed Computing, Vol. 68, Issue 5 https://doi.org/10.1016/j.jpdc.2007.09.006	journal	May 2008
Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication Catalyurek, U. V.; Aykanat, C. IEEE Transactions on Parallel and Distributed Systems, Vol. 10, Issue 7 https://doi.org/10.1109/71.780863	journal	July 1999
Distributed nonnegative matrix factorization for web-scale dyadic data analysis on mapreduce Liu, Chao; Yang, Hung-chih; Fan, Jinliang Proceedings of the 19th international conference on World wide web - WWW '10 https://doi.org/10.1145/1772690.1772760	conference	January 2010
Nonlinear Programming Bertsekas, D. P. Journal of the Operational Research Society, Vol. 48, Issue 3 https://doi.org/10.1057/palgrave.jors.2600425	journal	March 1997
Text Mining using Non-Negative Matrix Factorizations Pauca, V. Paul; Shahnaz, Farial; Berry, Michael W. Proceedings of the 2004 SIAM International Conference on Data Mining https://doi.org/10.1137/1.9781611972740.45	conference	December 2013
Navigating the maze of graph analytics frameworks using massive graph datasets Satish, Nadathur; Sundaram, Narayanan; Patwary, Md. Mostofa Ali Proceedings of the 2014 ACM SIGMOD international conference on Management of data - SIGMOD '14 https://doi.org/10.1145/2588555.2610518	conference	January 2014

Similar Records

Multifrontal Non-negative Matrix Factorization

Conference · Sat Feb 29 23:00:00 EST 2020 · OSTI ID:1649537

MPI-FAUN: An MPI-Based Framework for Alternating-Updating Nonnegative Matrix Factorization

Journal Article · Sun Oct 29 20:00:00 EDT 2017 · IEEE Transactions on Knowledge and Data Engineering · OSTI ID:1429224

A high-performance parallel algorithm for nonnegative matrix factorization

Journal Article · Thu Dec 31 23:00:00 EST 2015 · OSTI ID:1524064

Partitioning and Communication Strategies for Sparse Non-negative Matrix Factorization

Citation Formats

References (26)

Similar Records

Related Subjects