Optimizing High Performance Markov Clustering for Pre-Exascale Architectures
Abstract
HipMCL is a high-performance distributed memory implementation of the popular Markov Cluster Algorithm (MCL) and can cluster large-scale networks within hours using a few thousand CPU-equipped nodes. It relies on sparse matrix computations and heavily makes use of the sparse matrix-sparse matrix multiplication kernel (SpGEMM). The existing parallel algorithms in HipMCL are not scalable to Exascale architectures, both due to their communication costs dominating the runtime at large concurrencies and also due to their inability to take advantage of accelerators that are increasingly popular. In this work, we systematically remove scalability and performance bottlenecks of HipMCL. We enable GPUs by performing the expensive expansion phase of the MCL algorithm on GPU. Additionally, we propose a CPU-GPU joint distributed SpGEMM algorithm called pipelined Sparse SUMMA and integrate a probabilistic memory requirement estimator that is fast and accurate. Furthermore, we develop a new merging algorithm for the incremental processing of partial results produced by the GPUs, which improves the overlap efficiency and the peak memory usage. We also integrate a recent and faster algorithm for performing SpGEMM on CPUs. We validate our new algorithms and optimizations with extensive evaluations. With the enabling of the GPUs and integration of new algorithms, HipMCL ismore »
- Authors:
-
- Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
- Indiana Univ., Bloomington, IN (United States)
- Publication Date:
- Research Org.:
- Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
- Sponsoring Org.:
- USDOE Office of Science (SC)
- OSTI Identifier:
- 1650092
- Grant/Contract Number:
- AC02-05CH11231; AC05-00OR22725
- Resource Type:
- Accepted Manuscript
- Journal Name:
- Proceedings - IEEE International Parallel and Distributed Processing Symposium (IPDPS)
- Additional Journal Information:
- Journal Name: Proceedings - IEEE International Parallel and Distributed Processing Symposium (IPDPS); Journal Volume: 2020; Conference: 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), New Orleans, LA (United States), 18-22 May 2020
- Publisher:
- IEEE
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING
Citation Formats
Selvitopi, Oguz, Hussain, Md Taufique, Azad, Ariful, and Buluc, Aydin. Optimizing High Performance Markov Clustering for Pre-Exascale Architectures. United States: N. p., 2020.
Web. doi:10.1109/ipdps47924.2020.00022.
Selvitopi, Oguz, Hussain, Md Taufique, Azad, Ariful, & Buluc, Aydin. Optimizing High Performance Markov Clustering for Pre-Exascale Architectures. United States. https://doi.org/10.1109/ipdps47924.2020.00022
Selvitopi, Oguz, Hussain, Md Taufique, Azad, Ariful, and Buluc, Aydin. Fri .
"Optimizing High Performance Markov Clustering for Pre-Exascale Architectures". United States. https://doi.org/10.1109/ipdps47924.2020.00022. https://www.osti.gov/servlets/purl/1650092.
@article{osti_1650092,
title = {Optimizing High Performance Markov Clustering for Pre-Exascale Architectures},
author = {Selvitopi, Oguz and Hussain, Md Taufique and Azad, Ariful and Buluc, Aydin},
abstractNote = {HipMCL is a high-performance distributed memory implementation of the popular Markov Cluster Algorithm (MCL) and can cluster large-scale networks within hours using a few thousand CPU-equipped nodes. It relies on sparse matrix computations and heavily makes use of the sparse matrix-sparse matrix multiplication kernel (SpGEMM). The existing parallel algorithms in HipMCL are not scalable to Exascale architectures, both due to their communication costs dominating the runtime at large concurrencies and also due to their inability to take advantage of accelerators that are increasingly popular. In this work, we systematically remove scalability and performance bottlenecks of HipMCL. We enable GPUs by performing the expensive expansion phase of the MCL algorithm on GPU. Additionally, we propose a CPU-GPU joint distributed SpGEMM algorithm called pipelined Sparse SUMMA and integrate a probabilistic memory requirement estimator that is fast and accurate. Furthermore, we develop a new merging algorithm for the incremental processing of partial results produced by the GPUs, which improves the overlap efficiency and the peak memory usage. We also integrate a recent and faster algorithm for performing SpGEMM on CPUs. We validate our new algorithms and optimizations with extensive evaluations. With the enabling of the GPUs and integration of new algorithms, HipMCL is up to 12.4x faster, being able to cluster a network with 70 million proteins and 68 billion connections just under 15 minutes using 1024 nodes of ORNL's Summit supercomputer.},
doi = {10.1109/ipdps47924.2020.00022},
journal = {Proceedings - IEEE International Parallel and Distributed Processing Symposium (IPDPS)},
number = ,
volume = 2020,
place = {United States},
year = {Fri May 01 00:00:00 EDT 2020},
month = {Fri May 01 00:00:00 EDT 2020}
}
Works referenced in this record:
HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks
journal, January 2018
- Azad, Ariful; Pavlopoulos, Georgios A.; Ouzounis, Christos A.
- Nucleic Acids Research, Vol. 46, Issue 6
Brief Announcement: Hypergraph Partitioning for Parallel Sparse Matrix-Matrix Multiplication
conference, January 2015
- Ballard, Grey; Druinsky, Alex; Knight, Nicholas
- Proceedings of the 27th ACM on Symposium on Parallelism in Algorithms and Architectures - SPAA '15
IMG/M: integrated genome and metagenome comparative data analysis system
journal, October 2016
- Chen, I-Min A.; Markowitz, Victor M.; Chu, Ken
- Nucleic Acids Research, Vol. 45, Issue D1
Brief Announcement: Hypergraph Partitioning for Parallel Sparse Matrix-Matrix Multiplication
conference, January 2015
- Ballard, Grey; Druinsky, Alex; Knight, Nicholas
- Proceedings of the 27th ACM on Symposium on Parallelism in Algorithms and Architectures - SPAA '15
Partitioning Models for Scaling Parallel Sparse Matrix-Matrix Multiplication
journal, April 2018
- Akbudak, Kadir; Selvitopi, Oguz; Aykanat, Cevdet
- ACM Transactions on Parallel Computing, Vol. 4, Issue 3
ViennaCL---Linear Algebra Library for Multi- and Many-Core Architectures
journal, January 2016
- Rupp, Karl; Tillet, Philippe; Rudolf, Florian
- SIAM Journal on Scientific Computing, Vol. 38, Issue 5
An Efficient GPU General Sparse Matrix-Matrix Multiplication for Irregular Data
conference, May 2014
- Liu, Weifeng; Vinter, Brian
- 2014 IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2014 IEEE 28th International Parallel and Distributed Processing Symposium
Performance-portable sparse matrix-matrix multiplication for many-core architectures
conference, May 2017
- Deveci, Mehmet; Trott, Christian; Rajamanickam, Sivasankaran
- 2017 IEEE International Parallel and Distributed Processing Symposium: Workshops (IPDPSW), 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
Memory-Efficient Sparse Matrix-Matrix Multiplication by Row Merging on Many-Core Architectures
journal, January 2018
- Gremse, Felix; Küpper, Kerstin; Naumann, Uwe
- SIAM Journal on Scientific Computing, Vol. 40, Issue 4
Optimizing Sparse Matrix—Matrix Multiplication for the GPU
journal, October 2015
- Dalton, Steven; Olson, Luke; Bell, Nathan
- ACM Transactions on Mathematical Software, Vol. 41, Issue 4
Balanced Hashing and Efficient GPU Sparse General Matrix-Matrix Multiplication
conference, June 2016
- Anh, Pham Nguyen Quang; Fan, Rui; Wen, Yonggang
- Proceedings of the 2016 International Conference on Supercomputing
On improving performance of sparse matrix-matrix multiplication on GPUs
conference, June 2017
- Kunchum, Rakshith; Chaudhry, Ankur; Sukumaran-Rajam, Aravind
- Proceedings of the International Conference on Supercomputing
The Combinatorial BLAS: design, implementation, and applications
journal, May 2011
- Buluç, Aydın; Gilbert, John R.
- The International Journal of High Performance Computing Applications, Vol. 25, Issue 4
HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks
journal, January 2018
- Azad, Ariful; Pavlopoulos, Georgios A.; Ouzounis, Christos A.
- Nucleic Acids Research, Vol. 46, Issue 6
A Local Clustering Algorithm for Massive Graphs and Its Application to Nearly Linear Time Graph Partitioning
journal, January 2013
- Spielman, Daniel A.; Teng, Shang-Hua
- SIAM Journal on Computing, Vol. 42, Issue 1
Communication optimal parallel multiplication of sparse random matrices
conference, January 2013
- Ballard, Grey; Buluc, Aydin; Demmel, James
- Proceedings of the 25th ACM symposium on Parallelism in algorithms and architectures - SPAA '13
Parallel Sparse Matrix-Matrix Multiplication and Indexing: Implementation and Experiments
journal, January 2012
- Buluç, Aydin; Gilbert, John R.
- SIAM Journal on Scientific Computing, Vol. 34, Issue 4
A fast implementation of MLR-MCL algorithm on multi-core processors
conference, December 2014
- Niu, Qingpeng; Lai, Pai-Wei; Faisal, S. M.
- 2014 21st International Conference on High Performance Computing (HiPC)
Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication
journal, January 2016
- Azad, Ariful; Ballard, Grey; Buluç, Aydin
- SIAM Journal on Scientific Computing, Vol. 38, Issue 6
Two Fast Algorithms for Sparse Matrices: Multiplication and Permuted Transposition
journal, September 1978
- Gustavson, Fred G.
- ACM Transactions on Mathematical Software, Vol. 4, Issue 3
High-Performance Sparse Matrix-Matrix Products on Intel KNL and Multicore Architectures
conference, January 2018
- Nagasaka, Yusuke; Matsuoka, Satoshi; Azad, Ariful
- Proceedings of the 47th International Conference on Parallel Processing Companion - ICPP '18
Exposing Fine-Grained Parallelism in Algebraic Multigrid Methods
journal, January 2012
- Bell, Nathan; Dalton, Steven; Olson, Luke N.
- SIAM Journal on Scientific Computing, Vol. 34, Issue 4
Sparse Matrices in MATLAB: Design and Implementation
journal, January 1992
- Gilbert, John R.; Moler, Cleve; Schreiber, Robert
- SIAM Journal on Matrix Analysis and Applications, Vol. 13, Issue 1