skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Frequent Subgraph Discovery in Large Attributed Streaming Graphs

Abstract

The problem of finding frequent subgraphs in large dynamic graphs has so far only consid- ered a dynamic graph as being represented by a series of static snapshots taken at various points in time. This representation of a dynamic graph does not lend itself well to real time processing of real world graphs like social networks or internet traffic which consist of a stream of nodes and edges. In this paper we propose an algorithm that discovers the frequent subgraphs present in a graph represented by a stream of labeled nodes and edges. Our algorithm is efficient and consists of tunable parameters that can be tuned by the user to get interesting patterns from various kinds of graph data. In our model updates to the graph arrive in the form of batches which contain new nodes and edges. Our algorithm con- tinuously reports the frequent subgraphs that are estimated to be found in the entire graph as each batch arrives. We evaluate our system using 5 large dynamic graph datasets: the Hetrec 2011 challenge data, Twitter, DBLP and two synthetic. We evaluate our approach against two popular large graph miners, i.e., SUBDUE and GERM. Our experimental re- sults show thatmore » we can find the same frequent subgraphs as a non-incremental approach applied to snapshot graphs, and in less time.« less

Authors:
; ;
Publication Date:
Research Org.:
Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1178517
Report Number(s):
PNNL-SA-103377
400470000
DOE Contract Number:
AC05-76RL01830
Resource Type:
Conference
Resource Relation:
Conference: Proceedings of the 3rd International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications (BIGMINE 2014), August 24, 2014, 36:166-181
Country of Publication:
United States
Language:
English
Subject:
Dynamic graph, Frequent subgraph mining, pattern discovery

Citation Formats

Ray, Abhik, Holder, Larry, and Choudhury, Sutanay. Frequent Subgraph Discovery in Large Attributed Streaming Graphs. United States: N. p., 2014. Web.
Ray, Abhik, Holder, Larry, & Choudhury, Sutanay. Frequent Subgraph Discovery in Large Attributed Streaming Graphs. United States.
Ray, Abhik, Holder, Larry, and Choudhury, Sutanay. 2014. "Frequent Subgraph Discovery in Large Attributed Streaming Graphs". United States. doi:.
@article{osti_1178517,
title = {Frequent Subgraph Discovery in Large Attributed Streaming Graphs},
author = {Ray, Abhik and Holder, Larry and Choudhury, Sutanay},
abstractNote = {The problem of finding frequent subgraphs in large dynamic graphs has so far only consid- ered a dynamic graph as being represented by a series of static snapshots taken at various points in time. This representation of a dynamic graph does not lend itself well to real time processing of real world graphs like social networks or internet traffic which consist of a stream of nodes and edges. In this paper we propose an algorithm that discovers the frequent subgraphs present in a graph represented by a stream of labeled nodes and edges. Our algorithm is efficient and consists of tunable parameters that can be tuned by the user to get interesting patterns from various kinds of graph data. In our model updates to the graph arrive in the form of batches which contain new nodes and edges. Our algorithm con- tinuously reports the frequent subgraphs that are estimated to be found in the entire graph as each batch arrives. We evaluate our system using 5 large dynamic graph datasets: the Hetrec 2011 challenge data, Twitter, DBLP and two synthetic. We evaluate our approach against two popular large graph miners, i.e., SUBDUE and GERM. Our experimental re- sults show that we can find the same frequent subgraphs as a non-incremental approach applied to snapshot graphs, and in less time.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = 2014,
month = 8
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share:
  • As semantic datasets grow to be very large and divergent, there is a need to identify and exploit their inherent semantic structure for discovery and optimization. Towards that end, we present here a novel methodology to identify the semantic structures inherent in an arbitrary semantic graph dataset. We first present the concept of an extant ontology as a statistical description of the semantic relations present amongst the typed entities modeled in the graph. This serves as a model of the underlying semantic structure to aid in discovery and visualization. We then describe a method of ontological scaling in which themore » ontology is employed as a hierarchical scaling filter to infer different resolution levels at which the graph structures are to be viewed or analyzed. We illustrate these methods on three large and publicly available semantic datasets containing more than one billion edges each. Keywords-Semantic Web; Visualization; Ontology; Multi-resolution Data Mining;« less
  • Abstract not provided.
  • Abstract not provided.
  • Abstract not provided.
  • Graph pattern matching involves finding exact or approximate matches for a query subgraph in a larger graph. It has been studied extensively and has strong applications in domains such as computer vision, computational biology, social networks, security and finance. The problem of exact graph pattern matching is often described in terms of subgraph isomorphism which is NP-complete. The exponential growth in streaming data from online social networks, news and video streams and the continual need for situational awareness motivates a solution for finding patterns in streaming updates. This is also the prime driver for the real-time analytics market. Development ofmore » incremental algorithms for graph pattern matching on streaming inputs to a continually evolving graph is a nascent area of research. Some of the challenges associated with this problem are the same as found in continuous query (CQ) evaluation on streaming databases. This paper reviews some of the representative work from the exhaustively researched field of CQ systems and identifies important semantics, constraints and architectural features that are also appropriate for HPC systems performing real-time graph analytics. For each of these features we present a brief discussion of the challenge encountered in the database realm, the approach to the solution and state their relevance in a high-performance, streaming graph processing framework.« less