Parallel processing of filtered queries in attributed semantic graphs

Lugowski, Adam; Kamil, Shoaib; Buluç, Aydın; Williams, Samuel; Duriakova, Erika; Oliker, Leonid; Fox, Armando; Gilbert, John R.

doi:10.1016/j.jpdc.2014.08.010

Title: Parallel processing of filtered queries in attributed semantic graphs

Journal Article · Wed Sep 03 00:00:00 EDT 2014 · Journal of Parallel and Distributed Computing

DOI:https://doi.org/10.1016/j.jpdc.2014.08.010· OSTI ID:1524030

Lugowski, Adam ^[1]; Kamil, Shoaib ^[2]; Buluç, Aydın ^[3]; Williams, Samuel ^[3]; Duriakova, Erika ^[4]; Oliker, Leonid ^[3]; Fox, Armando ^[5]; Gilbert, John R. ^[5]

Univ. of California, Santa Barbara, CA (United States)
Massachusetts Inst. of Technology (MIT), Cambridge, MA (United States)
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Univ. College Dublin (Ireland)
Univ. of California, Berkeley, CA (United States)

Execution of complex analytic queries on massive semantic graphs is a challenging problem in big-data analytics that requires high-performance parallel computing. In a semantic graph, vertices and edges carry attributes of various types and the analytic queries typically depend on the values of these attributes. Thus, the computation must view the graph through a filter that passes only those individual vertices and edges of interest. Previous investigations have developed Knowledge Discovery Toolbox (KDT), a sophisticated Python library for parallel graph computations. In KDT, the user can write custom graph algorithms by specifying operations between edges and vertices (semiring operations). The user can also customize existing graph algorithms by writing filters. While the high-level language for this customization enables domain scientists to productively express their graph analytics requirements, the customized queries perform poorly due to the overhead of having to call into the Python virtual machine for each vertex and edge. In this work, we use the Selective Embedded Just-In-Time Specialization (SEJITS) approach to automatically translate semiring operations and filters defined by programmers into a lower-level efficiency language, bypassing the upcall into Python. We evaluate our approach by comparing it with the high-performance Combinatorial BLAS engine and show that our approach combines the benefits of programming in a high-level language with executing in a low-level parallel environment. We increase the system's flexibility by developing techniques that provide users with the ability to define new vertex and edge types from Python. We also present a new Roofline model for graph traversals and show that we achieve performance that is significantly closer to the bounds suggested by the Roofline. Finally, to further understand the complex interaction with the underlying architecture, we present an analysis using performance counters that quantifies the improvement in hardware behavior in the context our SEJITS methodology. Comprehensively, we demonstrate the first known solution to the problem of obtaining high performance from a productivity language when applying graph algorithms selectively on semantic graphs with hundreds of millions of edges and scaling to thousands of processors for graphs.

View Accepted Manuscript (DOE)

View Accepted Manuscript (Publisher)

Cite

Export

Save

Research Organization:: Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)

Sponsoring Organization:: USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)

Grant/Contract Number:: AC02-05CH11231; FA8750-10-1-0191; CNS-0960316; 8-482526701

OSTI ID:: 1524030

Alternate ID(s):: OSTI ID: 1249863

Journal Information:: Journal of Parallel and Distributed Computing, Vol. 79-80, Issue C; ISSN 0743-7315

Publisher:: ElsevierCopyright Statement

Country of Publication:: United States

Language:: English

Citation Metrics:

Cited by: 9 works

Citation information provided by
Web of Science

Figures / Tables (29)

Similar Records

High-performance analysis of filtered semantic graphs

Conference · Sun Jan 01 00:00:00 EST 2012 · OSTI ID:1524030

Buluc, Aydin; Fox, Armando; Gilbert, John R.; +4 more

Scalable Pattern Matching in Metadata Graphs via Constraint Checking

Journal Article · Mon Jan 04 00:00:00 EST 2021 · ACM Transactions on Parallel Computing · OSTI ID:1524030

Reza, Tahsin; Halawa, Hassan; Ripeanu, Matei; +2 more

Algorithms and architectures for high performance analysis of semantic graphs.

Technical Report · Thu Sep 01 00:00:00 EDT 2005 · OSTI ID:1524030

Hendrickson, Bruce Alan

Related Subjects

97 MATHEMATICS AND COMPUTING
Graph analysis systems
Attributed semantic graphs
Graph filtering
Parallel computing
Knowledge discovery
Domain-specific languages
SEJITS
High-performance graph analysis

Title: Parallel processing of filtered queries in attributed semantic graphs

Citation Formats

Figures / Tables (29)

Similar Records

Related Subjects