skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Decomposition of Large Scale Semantic Graphsvia an Efficient Communities Algorithm

Technical Report ·
DOI:https://doi.org/10.2172/926011· OSTI ID:926011

Semantic graphs have become key components in analyzing complex systems such as the Internet, or biological and social networks. These types of graphs generally consist of sparsely connected clusters or 'communities' whose nodes are more densely connected to each other than to other nodes in the graph. The identification of these communities is invaluable in facilitating the visualization, understanding, and analysis of large graphs by producing subgraphs of related data whose interrelationships can be readily characterized. Unfortunately, the ability of LLNL to effectively analyze the terabytes of multisource data at its disposal has remained elusive, since existing decomposition algorithms become computationally prohibitive for graphs of this size. We have addressed this limitation by developing more efficient algorithms for discerning community structure that can effectively process massive graphs. Current algorithms for detecting community structure, such as the high quality algorithm developed by Girvan and Newman [1], are only capable of processing relatively small graphs. The cubic complexity of Girvan and Newman, for example, makes it impractical for graphs with more than approximately 10{sup 4} nodes. Our goal for this project was to develop methodologies and corresponding algorithms capable of effectively processing graphs with up to 10{sup 9} nodes. From a practical standpoint, we expect the developed scalable algorithms to help resolve a variety of operational issues associated with the productive use of semantic graphs at LLNL. During FY07, we completed a graph clustering implementation that leverages a dynamic graph transformation to more efficiently decompose large graphs. In essence, our approach dynamically transforms the graph (or subgraphs) into a tree structure consisting of biconnected components interconnected by bridge links. This isomorphism allows us to compute edge betweenness, the chief source of inefficiency in Girvan and Newman's decomposition algorithm, much more efficiently, leading to significantly reduced computation time. Test runs on a desktop computer have shown reductions of up to 89%. Our focus this year has been on the implementation of parallel graph clustering on one of LLNL's supercomputers. In order to achieve efficiency in parallel computing, we have exploited the fact that large semantic graphs tend to be sparse, comprising loosely connected dense node clusters. When implemented on distributed memory computers, our approach performed well on several large graphs with up to one billion nodes, as shown in Table 2. The rightmost column of Table 2 contains the associated Newman's modularity [1], a metric that is widely used to assess the quality of community structure. Existing algorithms produce results that merely approximate the optimal solution, i.e., maximum modularity. We have developed a verification tool for decomposition algorithms, based upon a novel integer linear programming (ILP) approach, that computes an exact solution. We have used this ILP methodology to find the maximum modularity and corresponding optimal community structure for several well-studied graphs in the literature (e.g., Figure 1) [3]. The above approaches assume that modularity is the best measure of quality for community structure. In an effort to enhance this quality metric, we have also generalized Newman's modularity based upon an insightful random walk interpretation that allows us to vary the scope of the metric. Generalized modularity has enabled us to develop new, more flexible versions of our algorithms. In developing these methodologies, we have made several contributions to both graph theoretic algorithms and software engineering. We have written two research papers for refereed publication [3-4] and are working on another one [5]. In addition, we have presented our research findings at three academic and professional conferences.

Research Organization:
Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
W-7405-ENG-48
OSTI ID:
926011
Report Number(s):
LLNL-TR-401193; TRN: US200807%%558
Country of Publication:
United States
Language:
English