Optimizing High Performance Markov Clustering for Pre-Exascale Architectures
- Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
- Indiana Univ., Bloomington, IN (United States)
HipMCL is a high-performance distributed memory implementation of the popular Markov Cluster Algorithm (MCL) and can cluster large-scale networks within hours using a few thousand CPU-equipped nodes. It relies on sparse matrix computations and heavily makes use of the sparse matrix-sparse matrix multiplication kernel (SpGEMM). The existing parallel algorithms in HipMCL are not scalable to Exascale architectures, both due to their communication costs dominating the runtime at large concurrencies and also due to their inability to take advantage of accelerators that are increasingly popular. In this work, we systematically remove scalability and performance bottlenecks of HipMCL. We enable GPUs by performing the expensive expansion phase of the MCL algorithm on GPU. Additionally, we propose a CPU-GPU joint distributed SpGEMM algorithm called pipelined Sparse SUMMA and integrate a probabilistic memory requirement estimator that is fast and accurate. Furthermore, we develop a new merging algorithm for the incremental processing of partial results produced by the GPUs, which improves the overlap efficiency and the peak memory usage. We also integrate a recent and faster algorithm for performing SpGEMM on CPUs. We validate our new algorithms and optimizations with extensive evaluations. With the enabling of the GPUs and integration of new algorithms, HipMCL is up to 12.4x faster, being able to cluster a network with 70 million proteins and 68 billion connections just under 15 minutes using 1024 nodes of ORNL's Summit supercomputer.
- Research Organization:
- Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC)
- Grant/Contract Number:
- AC02-05CH11231; AC05-00OR22725
- OSTI ID:
- 1650092
- Journal Information:
- Proceedings - IEEE International Parallel and Distributed Processing Symposium (IPDPS), Vol. 2020; Conference: 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), New Orleans, LA (United States), 18-22 May 2020
- Publisher:
- IEEECopyright Statement
- Country of Publication:
- United States
- Language:
- English
Similar Records
GSoFa: Scalable Sparse Symbolic LU Factorization on GPUs
Data Locality Enhancement of Dynamic Simulations for Exascale Computing (Final Report)