Streaming data analytics via message passing with application to graph algorithms
Abstract
The need to process streaming data, which arrives continuously at high-volume in real-time, arises in a variety of contexts including data produced by experiments, collections of environmental or network sensors, and running simulations. Streaming data can also be formulated as queries or transactions which operate on a large dynamic data store, e.g. a distributed database. We describe a lightweight, portable framework named PHISH which enables a set of independent processes to compute on a stream of data in a distributed-memory parallel manner. Datums are routed between processes in patterns defined by the application. PHISH can run on top of either message-passing via MPI or sockets via ZMQ. The former means streaming computations can be run on any parallel machine which supports MPI; the latter allows them to run on a heterogeneous, geographically dispersed network of machines. We illustrate how PHISH can support streaming MapReduce operations, and describe streaming versions of three algorithms for large, sparse graph analytics: triangle enumeration, subgraph isomorphism matching, and connected component finding. Lastly, we also provide benchmark timings for MPI versus socket performance of several kernel operations useful in streaming algorithms.
- Authors:
-
- Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
- Publication Date:
- Research Org.:
- Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
- Sponsoring Org.:
- USDOE National Nuclear Security Administration (NNSA)
- OSTI Identifier:
- 1062811
- Report Number(s):
- SAND2012-9495J
Journal ID: ISSN 0743-7315; PII: S0743731514000884
- Grant/Contract Number:
- AC04-94AL85000
- Resource Type:
- Accepted Manuscript
- Journal Name:
- Journal of Parallel and Distributed Computing
- Additional Journal Information:
- Journal Volume: 74; Journal Issue: 8; Journal ID: ISSN 0743-7315
- Publisher:
- Elsevier
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING; Streaming data; Graph algorithms; Message passing; MPI; Sockets; MapReduce
Citation Formats
Plimpton, Steven J., and Shead, Tim. Streaming data analytics via message passing with application to graph algorithms. United States: N. p., 2014.
Web. doi:10.1016/j.jpdc.2014.04.001.
Plimpton, Steven J., & Shead, Tim. Streaming data analytics via message passing with application to graph algorithms. United States. https://doi.org/10.1016/j.jpdc.2014.04.001
Plimpton, Steven J., and Shead, Tim. Tue .
"Streaming data analytics via message passing with application to graph algorithms". United States. https://doi.org/10.1016/j.jpdc.2014.04.001. https://www.osti.gov/servlets/purl/1062811.
@article{osti_1062811,
title = {Streaming data analytics via message passing with application to graph algorithms},
author = {Plimpton, Steven J. and Shead, Tim},
abstractNote = {The need to process streaming data, which arrives continuously at high-volume in real-time, arises in a variety of contexts including data produced by experiments, collections of environmental or network sensors, and running simulations. Streaming data can also be formulated as queries or transactions which operate on a large dynamic data store, e.g. a distributed database. We describe a lightweight, portable framework named PHISH which enables a set of independent processes to compute on a stream of data in a distributed-memory parallel manner. Datums are routed between processes in patterns defined by the application. PHISH can run on top of either message-passing via MPI or sockets via ZMQ. The former means streaming computations can be run on any parallel machine which supports MPI; the latter allows them to run on a heterogeneous, geographically dispersed network of machines. We illustrate how PHISH can support streaming MapReduce operations, and describe streaming versions of three algorithms for large, sparse graph analytics: triangle enumeration, subgraph isomorphism matching, and connected component finding. Lastly, we also provide benchmark timings for MPI versus socket performance of several kernel operations useful in streaming algorithms.},
doi = {10.1016/j.jpdc.2014.04.001},
journal = {Journal of Parallel and Distributed Computing},
number = 8,
volume = 74,
place = {United States},
year = {Tue May 06 00:00:00 EDT 2014},
month = {Tue May 06 00:00:00 EDT 2014}
}
Web of Science