skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Distributed out-of-memory NMF on CPU/GPU architectures

Journal Article · · Journal of Supercomputing
 [1];  [1];  [1];  [2];  [3];  [2];  [1]
  1. Los Alamos National Laboratory (LANL), Los Alamos, NM (United States). Theoretical Division
  2. Los Alamos National Laboratory (LANL), Los Alamos, NM (United States). Computer, Computational, and Statistical Science Division
  3. Los Alamos National Laboratory (LANL), Los Alamos, NM (United States). HPC Division

We propose an efficient distributed out-of-memory implementation of the non-negative matrix factorization (NMF) algorithm for heterogeneous high-performance-computing systems. The proposed implementation is based on prior work on NMFk, which can perform automatic model selection and extract latent variables and patterns from data. In this work, we extend NMFk by adding support for dense and sparse matrix operation on multi-node, multi-GPU systems. The resulting algorithm is optimized for out-of-memory problems where the memory required to factorize a given matrix is greater than the available GPU memory. Memory complexity is reduced by batching/tiling strategies, and sparse and dense matrix operations are significantly accelerated with GPU cores (or tensor cores when available). Input/output latency associated with batch copies between host and device is hidden using CUDA streams to overlap data transfers and compute asynchronously, and latency associated with collective communications (both intra-node and inter-node) is reduced using optimized NVIDIA Collective Communication Library (NCCL) based communicators. Benchmark results show significant improvement, from 32X to 76x speedup, with the new implementation using GPUs over the CPU-based NMFk. Good weak scaling was demonstrated on up to 4096 multi-GPU cluster nodes with approximately 25,000 GPUs when decomposing a dense 340 Terabyte-size matrix and an 11 Exabyte-size sparse matrix of density 10-6.

Research Organization:
Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA), Office of Defense Nuclear Nonproliferation; USDOE Laboratory Directed Research and Development (LDRD) Program
Grant/Contract Number:
89233218CNA000001; AC52-06NA25396; 20190020DR
OSTI ID:
2246858
Report Number(s):
LA-UR-23-33139
Journal Information:
Journal of Supercomputing, Vol. 80; ISSN 0920-8542
Publisher:
SpringerCopyright Statement
Country of Publication:
United States
Language:
English

References (39)

SciPy 1.0: fundamental algorithms for scientific computing in Python journal February 2020
SeNMFk-SPLIT conference September 2022
Statistical Inference, Learning and Models in Big Data journal June 2016
Variational Nonnegative Matrix Factorisation conference April 2009
Collaborative Filtering Recommendation Using Nonnegative Matrix Factorization in GPU-Accelerated Spark Platform journal January 2021
nmfgpu4R: GPU-Accelerated Computation of the Non-Negative Matrix Factorization (NMF) Using CUDA Capable Hardware journal January 2016
Software for Sparse Tensor Decomposition on Emerging Computing Architectures journal January 2019
Algorithms for nonnegative matrix and tensor factorizations: a unified view based on block coordinate descent framework journal March 2013
Signatures of mutational processes in human cancer journal August 2013
General-purpose Unsupervised Cyber Anomaly Detection via Non-negative Tensor Factorization journal March 2023
Distributed Out-of-Memory SVD on CPU/GPU Architectures conference September 2022
Studies of Materials at the Nanometer Scale Using Coherent X-Ray Diffraction Imaging journal August 2013
“Data is the new oil”: citizen science and informed consent in an era of researchers handling of an economically valuable resource journal December 2021
Efficient Large Message Broadcast using NCCL and CUDA-Aware MPI for Deep Learning conference September 2016
Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-Way Data Analysis and Blind Source Separation book September 2009
Energy-free machine learning force field for aluminum journal August 2017
mpi4py: Status Update After 12 Years of Development journal July 2021
Distributed Non-Negative Tensor Train Decomposition conference September 2020
Finding the Number of Latent Topics With Semantic Non-Negative Matrix Factorization journal January 2021
Semantic Nonnegative Matrix Factorization with Automatic Model Determination for Topic Modeling conference December 2020
Nonnegative tensor decomposition with custom clustering for microphase separation of block copolymers
  • Alexandrov, Boian S.; Stanev, Valentin G.; Vesselinov, Velimir V.
  • Statistical Analysis and Data Mining: The ASA Data Science Journal, Vol. 12, Issue 4 https://doi.org/10.1002/sam.11407
journal February 2019
An active learning reliability method combining Kriging constructed with exploration and exploitation of failure region and subset simulation journal August 2019
Deciphering Signatures of Mutational Processes Operative in Human Cancer journal January 2013
Array programming with NumPy journal September 2020
Fast Nonnegative Tensor Factorization with an Active-Set-Like Method book January 2012
NMF-mGPU: non-negative matrix factorization on multi-GPU systems journal February 2015
Distributed non-negative matrix factorization with determination of the number of latent features journal February 2020
ALO-NMF: Accelerated Locality-Optimized Non-negative Matrix Factorization
  • Moon, Gordon E.; Ellis, J. Austin; Sukumaran-Rajam, Aravind
  • Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining https://doi.org/10.1145/3394486.3403227
conference August 2020
A high-performance parallel algorithm for nonnegative matrix factorization journal February 2016
Machine learning of molecular properties: Locality and active learning journal June 2018
pyDRESCALk: Python Distributed Non Negative RESCAL Decomposition with Determination of Latent Features software December 2021
Planc journal June 2021
Selection of Optimal Salient Time Steps by Non-negative Tucker Tensor Decomposition text January 2021
The repertoire of mutational signatures in human cancer journal February 2020
Machine learning and LHC event generation journal April 2023
Learning the parts of objects by non-negative matrix factorization journal October 1999
Behavioral clusters in dynamic graphs journal August 2015
Non-negative Matrix Factorization Implementation Using Graphic Processing Units book January 2010
Nanoflow electrospinning serial femtosecond crystallography journal October 2012

Similar Records

High performance sparse multifrontal solvers on modern GPUs
Journal Article · Sat Feb 05 00:00:00 EST 2022 · Parallel Computing · OSTI ID:2246858

A communication-avoiding 3D algorithm for sparse LU factorization on heterogeneous systems
Journal Article · Mon Aug 19 00:00:00 EDT 2019 · Journal of Parallel and Distributed Computing · OSTI ID:2246858

Distributed non-negative RESCAL with automatic model selection for exascale data
Journal Article · Fri Sep 01 00:00:00 EDT 2023 · Journal of Parallel and Distributed Computing · OSTI ID:2246858