skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Partitioning and Communication Strategies for Sparse Non-negative Matrix Factorization

Conference ·

Non-negative matrix factorization (NMF), the problem of finding two non-negative low-rank factors whose product approximates an input matrix, is a useful tool for many data mining and scientific applications such as topic modeling in text mining and unmixing in microscopy. In this paper, we focus on scaling algorithms for NMF to very large sparse datasets and massively parallel machines by employing effective algorithms, communication patterns, and partitioning schemes that leverage the sparsity of the input matrix. We consider two previous works developed for related problems, one that uses a fine-grained partitioning strategy using a point-to-point communication pattern and one that uses a Cartesian, or checkerboard, partitioning strategy using a collective-based communication pattern. We show that a combination of the previous approaches balances the demands of the various computations within NMF algorithms and achieves high efficiency and scalability. From the experiments, we see that our proposed strategy runs up to 10x faster than the state of the art on real-world datasets.

Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
AC05-00OR22725
OSTI ID:
1470857
Resource Relation:
Conference: 47th International Conference on Parallel Processing - Eugene, Oregon, United States of America - 8/13/2018 8:00:00 AM-8/16/2018 8:00:00 AM
Country of Publication:
United States
Language:
English

References (26)

CloudNMF: A MapReduce Implementation of Nonnegative Matrix Factorization for Large-scale Biological Datasets journal February 2014
A high-performance parallel algorithm for nonnegative matrix factorization
  • Kannan, Ramakrishnan; Ballard, Grey; Park, Haesun
  • Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming - PPoPP '16 https://doi.org/10.1145/2851141.2851152
conference January 2016
SymNMF: nonnegative low-rank approximation of a similarity matrix for graph clustering journal November 2014
NOMAD: non-locking, stochastic multi-machine algorithm for asynchronous and decentralized matrix completion journal July 2014
Deep data analysis via physically constrained linear unmixing: universal framework, domain examples, and a community-wide platform journal April 2018
Alternating direction method of multipliers for non-negative matrix factorization with the beta-divergence conference May 2014
Navigating the maze of graph analytics frameworks using massive graph datasets
  • Satish, Nadathur; Sundaram, Narayanan; Patwary, Md. Mostofa Ali
  • Proceedings of the 2014 ACM SIGMOD international conference on Management of data - SIGMOD '14 https://doi.org/10.1145/2588555.2610518
conference January 2014
F lexi F a CT: Scalable Flexible Factorization of Coupled Tensors on Hadoop conference April 2014
Algorithms for nonnegative matrix and tensor factorizations: a unified view based on block coordinate descent framework journal March 2013
Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication journal July 1999
Symmetric Nonnegative Matrix Factorization for Graph Clustering conference December 2013
Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis journal May 2007
MPI-FAUN: An MPI-Based Framework for Alternating-Updating Nonnegative Matrix Factorization journal March 2018
Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-Way Data Analysis and Blind Source Separation book September 2009
Nonlinear Programming journal March 1997
Distributed GraphLab: a framework for machine learning and data mining in the cloud journal April 2012
NeNMF: An Optimal Gradient Method for Nonnegative Matrix Factorization journal June 2012
Distributed nonnegative matrix factorization for web-scale dyadic data analysis on mapreduce conference January 2010
Multi-level direct K-way hypergraph partitioning with multiple constraints and fixed vertices journal May 2008
Kernels for scalable data analysis in science: Towards an architecture-portable future conference December 2016
NMF-mGPU: non-negative matrix factorization on multi-GPU systems journal February 2015
Scalable sparse tensor decompositions in distributed memory systems conference January 2015
Supporting Array Programming in X10
  • Grove, David; Milthorpe, Josh; Tardieu, Olivier
  • Proceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming - ARRAY'14 https://doi.org/10.1145/2627373.2627380
conference January 2014
R-MAT: A Recursive Model for Graph Mining conference December 2013
Mini-apps for high performance data analysis conference December 2016
Text Mining using Non-Negative Matrix Factorizations conference December 2013

Similar Records

Multifrontal Non-negative Matrix Factorization
Conference · Sun Mar 01 00:00:00 EST 2020 · OSTI ID:1470857

MPI-FAUN: An MPI-Based Framework for Alternating-Updating Nonnegative Matrix Factorization
Journal Article · Mon Oct 30 00:00:00 EDT 2017 · IEEE Transactions on Knowledge and Data Engineering · OSTI ID:1470857

A high-performance parallel algorithm for nonnegative matrix factorization
Journal Article · Fri Jan 01 00:00:00 EST 2016 · OSTI ID:1470857

Related Subjects