skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: A High-Throughput Solver for Marginalized Graph Kernels on GPU

Journal Article · · 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
 [1];  [1];  [1];  [1]
  1. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)

Here, we present the design and optimization of a solver for efficient and high-throughput computation of the marginalized graph kernel on General Purpose GPUs. The graph kernel is computed using the conjugate gradient method to solve a generalized Laplacian of the tensor product between a pair of graphs. To cope with the large gap between the instruction throughput and the memory bandwidth of the GPUs, our solver forms the graph tensor product on-the-fly without storing it in memory. This is achieved by using threads in a warp cooperatively to stream the adjacency and edge label matrices of individual graphs by small square matrix blocks called tiles, which are then staged in registers and the shared memory for later reuse. Warps across a thread block can further share tiles via the shared memory to increase data reuse. We exploit the sparsity of the graphs hierarchically by storing only non-empty tiles using a coordinate format and nonzero elements within each tile using bitmaps. We propose a new partition-based reordering algorithm for aggregating nonzero elements of the graphs into fewer but denser tiles to further exploit sparsity. We carry out extensive theoretical analyses on the graph tensor product primitives for tiles of various density and evaluate their performance on synthetic and real-world datasets. Our solver delivers three to four orders of magnitude speedup over existing CPU-based solvers such as GraKeL and GraphKernels. The capability of the solver enables kernel-based learning tasks at unprecedented scales.

Research Organization:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
Grant/Contract Number:
AC02-05CH11231; AC05-00OR22725
OSTI ID:
1582329
Journal Information:
2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Vol. 2020; Related Information: see also on arXiv abs/1910.06310
Publisher:
IEEECopyright Statement
Country of Publication:
United States
Language:
English

References (17)

The Protein Data Bank journal January 2000
A linear-time heuristic for improving network partitions conference January 1988
Improving performance of sparse matrix-vector multiplication conference January 1999
Accelerating dissipative particle dynamics simulations on GPUs: Algorithms, numerics and applications journal November 2014
Think Locally, Act Globally: Highly Balanced Graph Partitioning book January 2013
An effective multilevel tabu search approach for balanced graph partitioning journal July 2011
DrugBank 5.0: a major update to the DrugBank database for 2018 journal November 2017
graphkernels: R and Python packages for graph comparison journal September 2017
Protein function prediction via graph kernels journal June 2005
Roofline: an insightful visual performance model for multicore architectures journal April 2009
Prediction of atomization energy using graph kernel and active learning journal January 2019
A Recursive Hypergraph Bipartitioning Framework for Reducing Bandwidth and Latency Costs Simultaneously journal January 2016
Automated scientific software scripting with SWIG journal July 2003
Cython: The Best of Both Worlds journal March 2011
Parallel algorithms for tensor product-based inexact graph matching conference June 2012
Global alignment of multiple protein interaction networks with application to functional orthology detection journal August 2008
Design of the GraphBLAS API for C conference May 2017