skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Optimizing High Performance Markov Clustering for Pre-Exascale Architectures

Journal Article · · Proceedings - IEEE International Parallel and Distributed Processing Symposium (IPDPS)
 [1];  [2];  [2];  [1]
  1. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
  2. Indiana Univ., Bloomington, IN (United States)

HipMCL is a high-performance distributed memory implementation of the popular Markov Cluster Algorithm (MCL) and can cluster large-scale networks within hours using a few thousand CPU-equipped nodes. It relies on sparse matrix computations and heavily makes use of the sparse matrix-sparse matrix multiplication kernel (SpGEMM). The existing parallel algorithms in HipMCL are not scalable to Exascale architectures, both due to their communication costs dominating the runtime at large concurrencies and also due to their inability to take advantage of accelerators that are increasingly popular. In this work, we systematically remove scalability and performance bottlenecks of HipMCL. We enable GPUs by performing the expensive expansion phase of the MCL algorithm on GPU. Additionally, we propose a CPU-GPU joint distributed SpGEMM algorithm called pipelined Sparse SUMMA and integrate a probabilistic memory requirement estimator that is fast and accurate. Furthermore, we develop a new merging algorithm for the incremental processing of partial results produced by the GPUs, which improves the overlap efficiency and the peak memory usage. We also integrate a recent and faster algorithm for performing SpGEMM on CPUs. We validate our new algorithms and optimizations with extensive evaluations. With the enabling of the GPUs and integration of new algorithms, HipMCL is up to 12.4x faster, being able to cluster a network with 70 million proteins and 68 billion connections just under 15 minutes using 1024 nodes of ORNL's Summit supercomputer.

Research Organization:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
USDOE Office of Science (SC)
Grant/Contract Number:
AC02-05CH11231; AC05-00OR22725
OSTI ID:
1650092
Journal Information:
Proceedings - IEEE International Parallel and Distributed Processing Symposium (IPDPS), Vol. 2020; Conference: 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), New Orleans, LA (United States), 18-22 May 2020
Publisher:
IEEECopyright Statement
Country of Publication:
United States
Language:
English

References (21)

HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks journal January 2018
Brief Announcement: Hypergraph Partitioning for Parallel Sparse Matrix-Matrix Multiplication conference January 2015
IMG/M: integrated genome and metagenome comparative data analysis system journal October 2016
Partitioning Models for Scaling Parallel Sparse Matrix-Matrix Multiplication journal April 2018
ViennaCL---Linear Algebra Library for Multi- and Many-Core Architectures journal January 2016
An Efficient GPU General Sparse Matrix-Matrix Multiplication for Irregular Data
  • Liu, Weifeng; Vinter, Brian
  • 2014 IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2014 IEEE 28th International Parallel and Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2014.47
conference May 2014
Performance-portable sparse matrix-matrix multiplication for many-core architectures
  • Deveci, Mehmet; Trott, Christian; Rajamanickam, Sivasankaran
  • 2017 IEEE International Parallel and Distributed Processing Symposium: Workshops (IPDPSW), 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) https://doi.org/10.1109/IPDPSW.2017.8
conference May 2017
Memory-Efficient Sparse Matrix-Matrix Multiplication by Row Merging on Many-Core Architectures journal January 2018
Optimizing Sparse Matrix—Matrix Multiplication for the GPU journal October 2015
Balanced Hashing and Efficient GPU Sparse General Matrix-Matrix Multiplication conference June 2016
On improving performance of sparse matrix-matrix multiplication on GPUs conference June 2017
The Combinatorial BLAS: design, implementation, and applications journal May 2011
A Local Clustering Algorithm for Massive Graphs and Its Application to Nearly Linear Time Graph Partitioning journal January 2013
Communication optimal parallel multiplication of sparse random matrices conference January 2013
Parallel Sparse Matrix-Matrix Multiplication and Indexing: Implementation and Experiments journal January 2012
A fast implementation of MLR-MCL algorithm on multi-core processors conference December 2014
Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication journal January 2016
Two Fast Algorithms for Sparse Matrices: Multiplication and Permuted Transposition journal September 1978
High-Performance Sparse Matrix-Matrix Products on Intel KNL and Multicore Architectures conference January 2018
Exposing Fine-Grained Parallelism in Algebraic Multigrid Methods journal January 2012
Sparse Matrices in MATLAB: Design and Implementation journal January 1992

Similar Records

HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks
Journal Article · Fri Jan 05 00:00:00 EST 2018 · Nucleic Acids Research · OSTI ID:1650092

GSoFa: Scalable Sparse Symbolic LU Factorization on GPUs
Journal Article · Fri Apr 01 00:00:00 EDT 2022 · IEEE Transactions on Parallel and Distributed Systems · OSTI ID:1650092

Data Locality Enhancement of Dynamic Simulations for Exascale Computing (Final Report)
Technical Report · Fri Nov 29 00:00:00 EST 2019 · OSTI ID:1650092

Related Subjects