skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Optimizing Approximate Weighted Matching on Nvidia Kepler K40

Abstract

Matching is a fundamental graph problem with numerous applications in science and engineering. While algorithms for computing optimal matchings are difficult to parallelize, approximation algorithms on the other hand generally compute high quality solutions and are amenable to parallelization. In this paper, we present efficient implementations of the current best algorithm for half-approximate weighted matching, the Suitor algorithm, on Nvidia Kepler K-40 platform. We develop four variants of the algorithm that exploit hardware features to address key challenges for a GPU implementation. We also experiment with different combinations of work assigned to a warp. Using an exhaustive set of $269$ inputs, we demonstrate that the new implementation outperforms the previous best GPU algorithm by $10$ to $$100\times$$ for over $100$ instances, and from $100$ to $$1000\times$$ for $15$ instances. We also demonstrate up to $$20\times$$ speedup relative to $2$ threads, and up to $$5\times$$ relative to $16$ threads on Intel Xeon platform with $16$ cores for the same algorithm. The new algorithms and implementations provided in this paper will have a direct impact on several applications that repeatedly use matching as a key compute kernel. Further, algorithm designs and insights provided in this paper will benefit other researchers implementing graph algorithms on modern GPU architectures.

Authors:
; ; ; ;
Publication Date:
Research Org.:
Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1254605
Report Number(s):
PNNL-SA-113350
400470000
DOE Contract Number:  
AC05-76RL01830
Resource Type:
Conference
Resource Relation:
Conference: IEEE 22nd International Conference on High Performance Computing (HiPC 2015), December 16-19, 2015, Bangalore, India, 105-114
Country of Publication:
United States
Language:
English
Subject:
matching; GPU

Citation Formats

Naim, Md, Manne, Fredrik, Halappanavar, Mahantesh, Tumeo, Antonino, and Langguth, Johannes. Optimizing Approximate Weighted Matching on Nvidia Kepler K40. United States: N. p., 2015. Web. doi:10.1109/HiPC.2015.15.
Naim, Md, Manne, Fredrik, Halappanavar, Mahantesh, Tumeo, Antonino, & Langguth, Johannes. Optimizing Approximate Weighted Matching on Nvidia Kepler K40. United States. doi:10.1109/HiPC.2015.15.
Naim, Md, Manne, Fredrik, Halappanavar, Mahantesh, Tumeo, Antonino, and Langguth, Johannes. Wed . "Optimizing Approximate Weighted Matching on Nvidia Kepler K40". United States. doi:10.1109/HiPC.2015.15.
@article{osti_1254605,
title = {Optimizing Approximate Weighted Matching on Nvidia Kepler K40},
author = {Naim, Md and Manne, Fredrik and Halappanavar, Mahantesh and Tumeo, Antonino and Langguth, Johannes},
abstractNote = {Matching is a fundamental graph problem with numerous applications in science and engineering. While algorithms for computing optimal matchings are difficult to parallelize, approximation algorithms on the other hand generally compute high quality solutions and are amenable to parallelization. In this paper, we present efficient implementations of the current best algorithm for half-approximate weighted matching, the Suitor algorithm, on Nvidia Kepler K-40 platform. We develop four variants of the algorithm that exploit hardware features to address key challenges for a GPU implementation. We also experiment with different combinations of work assigned to a warp. Using an exhaustive set of $269$ inputs, we demonstrate that the new implementation outperforms the previous best GPU algorithm by $10$ to $100\times$ for over $100$ instances, and from $100$ to $1000\times$ for $15$ instances. We also demonstrate up to $20\times$ speedup relative to $2$ threads, and up to $5\times$ relative to $16$ threads on Intel Xeon platform with $16$ cores for the same algorithm. The new algorithms and implementations provided in this paper will have a direct impact on several applications that repeatedly use matching as a key compute kernel. Further, algorithm designs and insights provided in this paper will benefit other researchers implementing graph algorithms on modern GPU architectures.},
doi = {10.1109/HiPC.2015.15},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2015},
month = {9}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: