skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: A study of graph partitioning schemes for parallel graph community detection

Publication Date:
Sponsoring Org.:
OSTI Identifier:
Resource Type:
Journal Article: Publisher's Accepted Manuscript
Journal Name:
Parallel Computing
Additional Journal Information:
Journal Volume: 58; Journal Issue: C; Related Information: CHORUS Timestamp: 2017-10-05 01:48:53; Journal ID: ISSN 0167-8191
Country of Publication:

Citation Formats

Zeng, Jianping, and Yu, Hongfeng. A study of graph partitioning schemes for parallel graph community detection. Netherlands: N. p., 2016. Web. doi:10.1016/j.parco.2016.05.008.
Zeng, Jianping, & Yu, Hongfeng. A study of graph partitioning schemes for parallel graph community detection. Netherlands. doi:10.1016/j.parco.2016.05.008.
Zeng, Jianping, and Yu, Hongfeng. 2016. "A study of graph partitioning schemes for parallel graph community detection". Netherlands. doi:10.1016/j.parco.2016.05.008.
title = {A study of graph partitioning schemes for parallel graph community detection},
author = {Zeng, Jianping and Yu, Hongfeng},
abstractNote = {},
doi = {10.1016/j.parco.2016.05.008},
journal = {Parallel Computing},
number = C,
volume = 58,
place = {Netherlands},
year = 2016,
month =

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record at 10.1016/j.parco.2016.05.008

Save / Share:
  • Calculations can naturally be described as graphs in which vertices represent computation and edges reflect data dependencies. By partitioning the vertices of a graph, the calculation can be divided among processors of a parallel computer. However, the standard methodology for graph partitioning minimizes the wrong metric and lacks expressibility. We survey several recently proposed alternatives and discuss their relative merits.
  • Efficient use of a distributed memory parallel computer requires that the computational load be balanced across processors in a way that minimizes interprocessor communication. A new domain mapping algorithm is presented that extends recent work in which ideas from spectral graph theory have been applied to this problem. The generalization of spectral graph bisection involves a novel use of multiple eigenvectors to allow for division of a computation into four or eight parts at each stage of a recursive decomposition. The resulting method is suitable for scientific computations like irregular finite elements or differences performed on hypercube or mesh architecturemore » machines. Experimental results confirm that the new method provides better decompositions arrived at more economically and robustly than with previous spectral methods. This algorithm allows for arbitrary nonnegative weights on both vertices and edges to model inhomogeneous computation and communication. A new spectral lower bound for graph bisection is also presented.« less
  • The authors develop a parallel algorithm for partitioning the vertices of a graph into p greater than or equal to 2 sets in such a way that few edges connect vertices in different sets. The algorithm is intended for a message-passing multiprocessor system, such as the hypercube, and is based on the Kernighan-Lin algorithm for finding small edge separators on a single processor. They use this parallel partitioning algorithm to find orderings for factoring large sparse symmetric positive definite matrices. These orderings not only reduce fill, but also result in good processor utilization and low communication overheat during the factorization.more » They provide a complexity analysis of the algorithm, as well as some numerical results from an Intel hypercube and a hypercube simulator.« less
  • The method of discrete ordinates is commonly used to solve the Boltzmann transport equation. The solution in each ordinate direction is most efficiently computed by sweeping the radiation flux across the computational grid. For unstructured grids this poses many challenges, particularly when implemented on distributed-memory parallel machines where the grid geometry is spread across processors. We present several algorithms relevant to this approach: (a) an asynchronous message-passing algorithm that performs sweeps simultaneously in multiple ordinate directions, (b) a simple geometric heuristic to prioritize the computational tasks that a processor works on, (c) a partitioning algorithm that creates columnar-style decompositions formore » unstructured grids, and (d) an algorithm for detecting and eliminating cycles that sometimes exist in unstructured grids and can prevent sweeps from successfully completing. Algorithms (a) and (d) are fully parallel; algorithms (b) and (c) can be used in conjunction with (a) to achieve higher parallel efficiencies. We describe our message-passing implementations of these algorithms within a radiation transport package. Performance and scalability results are given for unstructured grids with up to 3 million elements (500 million unknowns) running on thousands of processors of Sandia National Laboratories' Intel Tflops machine and DEC-Alpha CPlant cluster.« less
  • Community detection has become a fundamental operation in numerous graph-theoretic applications. Despite its potential for application, there is only limited support for community detection on large-scale parallel computers, largely owing to the irregular and inherently sequential nature of the underlying heuristics. In this paper, we present parallelization heuristics for fast community detection using the Louvain method as the serial template. The Louvain method is an iterative heuristic for modularity optimization. Originally developed in 2008, the method has become increasingly popular owing to its ability to detect high modularity community partitions in a fast and memory-efficient manner. However, the method ismore » also inherently sequential, thereby limiting its scalability. Here, we observe certain key properties of this method that present challenges for its parallelization, and consequently propose heuristics that are designed to break the sequential barrier. For evaluation purposes, we implemented our heuristics using OpenMP multithreading, and tested them over real world graphs derived from multiple application domains. Compared to the serial Louvain implementation, our parallel implementation is able to produce community outputs with a higher modularity for most of the inputs tested, in comparable number or fewer iterations, while providing real speedups of up to 16x using 32 threads.« less
    Cited by 10