A High-Throughput Solver for Marginalized Graph Kernels on GPU

Tang, Yu-Hang; Selvitopi, Oguz; Popovici, Doru Thom; Buluc, Aydin

doi:10.1109/ipdps47924.2020.00080

Title: A High-Throughput Solver for Marginalized Graph Kernels on GPU

Journal Article · Fri May 01 00:00:00 EDT 2020 · 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

DOI:https://doi.org/10.1109/ipdps47924.2020.00080· OSTI ID:1582329

Tang, Yu-Hang ^[1]; Selvitopi, Oguz ^[1]; Popovici, Doru Thom ^[1]; Buluc, Aydin ^[1]

Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)

Here, we present the design and optimization of a solver for efficient and high-throughput computation of the marginalized graph kernel on General Purpose GPUs. The graph kernel is computed using the conjugate gradient method to solve a generalized Laplacian of the tensor product between a pair of graphs. To cope with the large gap between the instruction throughput and the memory bandwidth of the GPUs, our solver forms the graph tensor product on-the-fly without storing it in memory. This is achieved by using threads in a warp cooperatively to stream the adjacency and edge label matrices of individual graphs by small square matrix blocks called tiles, which are then staged in registers and the shared memory for later reuse. Warps across a thread block can further share tiles via the shared memory to increase data reuse. We exploit the sparsity of the graphs hierarchically by storing only non-empty tiles using a coordinate format and nonzero elements within each tile using bitmaps. We propose a new partition-based reordering algorithm for aggregating nonzero elements of the graphs into fewer but denser tiles to further exploit sparsity. We carry out extensive theoretical analyses on the graph tensor product primitives for tiles of various density and evaluate their performance on synthetic and real-world datasets. Our solver delivers three to four orders of magnitude speedup over existing CPU-based solvers such as GraKeL and GraphKernels. The capability of the solver enables kernel-based learning tasks at unprecedented scales.

View Accepted Manuscript (DOE)

Cite

Export

Save

Research Organization:: Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)

Sponsoring Organization:: USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)

Grant/Contract Number:: AC02-05CH11231; AC05-00OR22725

OSTI ID:: 1582329

Journal Information:: 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Vol. 2020; Related Information: see also on arXiv abs/1910.06310

Publisher:: IEEECopyright Statement

Country of Publication:: United States

Language:: English

References (17)

The Protein Data Bank Berman, H. M. Nucleic Acids Research, Vol. 28, Issue 1 https://doi.org/10.1093/nar/28.1.235	journal	January 2000
A linear-time heuristic for improving network partitions Fiduccia, C. M.; Mattheyses, R. M. Papers on Twenty-five years of electronic design automation - 25 years of DAC https://doi.org/10.1145/62882.62910	conference	January 1988
Improving performance of sparse matrix-vector multiplication Pinar, Ali; Heath, Michael T. Proceedings of the 1999 ACM/IEEE conference on Supercomputing https://doi.org/10.1145/331532.331562	conference	January 1999
Accelerating dissipative particle dynamics simulations on GPUs: Algorithms, numerics and applications Tang, Yu-Hang; Karniadakis, George Em Computer Physics Communications, Vol. 185, Issue 11 https://doi.org/10.1016/j.cpc.2014.06.015	journal	November 2014
Think Locally, Act Globally: Highly Balanced Graph Partitioning Sanders, Peter; Schulz, Christian Experimental Algorithms https://doi.org/10.1007/978-3-642-38527-8_16	book	January 2013
An effective multilevel tabu search approach for balanced graph partitioning Benlic, Una; Hao, Jin-Kao Computers & Operations Research, Vol. 38, Issue 7 https://doi.org/10.1016/j.cor.2010.10.007	journal	July 2011
DrugBank 5.0: a major update to the DrugBank database for 2018 Wishart, David S.; Feunang, Yannick D.; Guo, An C. Nucleic Acids Research, Vol. 46, Issue D1 https://doi.org/10.1093/nar/gkx1037	journal	November 2017
graphkernels: R and Python packages for graph comparison Sugiyama, Mahito; Ghisu, M. Elisabetta; Llinares-López, Felipe Bioinformatics, Vol. 34, Issue 3 https://doi.org/10.1093/bioinformatics/btx602	journal	September 2017
Protein function prediction via graph kernels Borgwardt, K. M.; Ong, C. S.; Schonauer, S. Bioinformatics, Vol. 21, Issue Suppl 1 https://doi.org/10.1093/bioinformatics/bti1007	journal	June 2005
Roofline: an insightful visual performance model for multicore architectures Williams, Samuel; Waterman, Andrew; Patterson, David Communications of the ACM, Vol. 52, Issue 4 https://doi.org/10.1145/1498765.1498785	journal	April 2009
Prediction of atomization energy using graph kernel and active learning Tang, Yu-Hang; de Jong, Wibe A. The Journal of Chemical Physics, Vol. 150, Issue 4 https://doi.org/10.1063/1.5078640	journal	January 2019
A Recursive Hypergraph Bipartitioning Framework for Reducing Bandwidth and Latency Costs Simultaneously Selvitopi, Oguz; Acer, Seher; Aykanat, Cevdet IEEE Transactions on Parallel and Distributed Systems https://doi.org/10.1109/TPDS.2016.2577024	journal	January 2016
Automated scientific software scripting with SWIG Beazley, D. M. Future Generation Computer Systems, Vol. 19, Issue 5 https://doi.org/10.1016/S0167-739X(02)00171-1	journal	July 2003
Cython: The Best of Both Worlds Behnel, Stefan; Bradshaw, Robert; Citro, Craig Computing in Science & Engineering, Vol. 13, Issue 2 https://doi.org/10.1109/MCSE.2010.118	journal	March 2011
Parallel algorithms for tensor product-based inexact graph matching Livi, Lorenzo; Rizzi, Antonello The 2012 International Joint Conference on Neural Networks (IJCNN) https://doi.org/10.1109/IJCNN.2012.6252681	conference	June 2012
Global alignment of multiple protein interaction networks with application to functional orthology detection Singh, R.; Xu, J.; Berger, B. Proceedings of the National Academy of Sciences, Vol. 105, Issue 35 https://doi.org/10.1073/pnas.0806627105	journal	August 2008
Design of the GraphBLAS API for C Buluc, Aydin; Mattson, Tim; McMillan, Scott 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) https://doi.org/10.1109/IPDPSW.2017.117	conference	May 2017

Similar Records

Data Locality Enhancement of Dynamic Simulations for Exascale Computing (Final Report)

Technical Report · Fri Nov 29 00:00:00 EST 2019 · OSTI ID:1582329

Shen, Xipeng

Power/Performance Trade-offs of Small Batched LU Based Solvers on GPUs

Conference · Mon Aug 26 00:00:00 EDT 2013 · OSTI ID:1582329

Villa, Oreste; Fatica, Massimiliano; Gawande, Nitin A.; +1 more

Optimizing Tensor Contractions in CCSD(T) for Efficient Execution on GPUs

Conference · Fri Jun 15 00:00:00 EDT 2018 · OSTI ID:1582329

Kim, Jinsung; Sukumaran-Rajan, Aravind; Hong, Changwan; +4 more

Related Subjects

97 MATHEMATICS AND COMPUTING
kernel
symmetric matrices
linear systems
mathematical model
tensile stress
task analysis
graphics processing units

Title: A High-Throughput Solver for Marginalized Graph Kernels on GPU

Citation Formats

References (17)

Similar Records

Related Subjects