skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Application-Specific Graph Sampling for Frequent Subgraph Mining and Community Detection

Conference ·

Graph mining is an important data analysis methodology, but struggles as the input graph size increases. The scalability and usability challenges posed by such large graphs make it imperative to sample the input graph and reduce its size. The critical challenge in sampling is to identify the appropriate algorithm to insure the resulting analysis does not suffer heavily from the data reduction. Predicting the expected performance degradation for a given graph and sampling algorithm is also useful. In this paper, we present different sampling approaches for graph mining applications such as Frequent Subgrpah Mining (FSM), and Community Detection (CD). We explore graph metrics such as PageRank, Triangles, and Diversity to sample a graph and conclude that for heterogeneous graphs Triangles and Diversity perform better than degree based metrics. We also present two new sampling variations for targeted graph mining applications. We present empirical results to show that knowledge of the target application, along with input graph properties can be used to select the best sampling algorithm. We also conclude that performance degradation is an abrupt, rather than gradual phenomena, as the sample size decreases. We present the empirical results to show that the performance degradation follows a logistic function.

Research Organization:
Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
AC05-76RL01830
OSTI ID:
1434886
Report Number(s):
PNNL-SA-128679
Resource Relation:
Conference: IEEE International Conference on Big Data (Big Data 2017), December 11-14, 2017, Boston, Massachusetts, 1000-1005
Country of Publication:
United States
Language:
English

Similar Records

Graph processing platforms at scale: practices and experiences
Conference · Thu Jan 01 00:00:00 EST 2015 · OSTI ID:1434886

Evaluation of Graph Analytics Frameworks Using the GAP Benchmark Suite
Conference · Thu Nov 19 00:00:00 EST 2020 · OSTI ID:1434886

EAGLE: 'EAGLE'Is an' Algorithmic Graph Library for Exploration
Software · Fri Jan 16 00:00:00 EST 2015 · OSTI ID:1434886

Related Subjects