Streaming data analytics via message passing with application to graph algorithms
- Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
The need to process streaming data, which arrives continuously at high-volume in real-time, arises in a variety of contexts including data produced by experiments, collections of environmental or network sensors, and running simulations. Streaming data can also be formulated as queries or transactions which operate on a large dynamic data store, e.g. a distributed database. We describe a lightweight, portable framework named PHISH which enables a set of independent processes to compute on a stream of data in a distributed-memory parallel manner. Datums are routed between processes in patterns defined by the application. PHISH can run on top of either message-passing via MPI or sockets via ZMQ. The former means streaming computations can be run on any parallel machine which supports MPI; the latter allows them to run on a heterogeneous, geographically dispersed network of machines. We illustrate how PHISH can support streaming MapReduce operations, and describe streaming versions of three algorithms for large, sparse graph analytics: triangle enumeration, subgraph isomorphism matching, and connected component finding. Lastly, we also provide benchmark timings for MPI versus socket performance of several kernel operations useful in streaming algorithms.
- Research Organization:
- Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
- Sponsoring Organization:
- USDOE National Nuclear Security Administration (NNSA)
- Grant/Contract Number:
- AC04-94AL85000
- OSTI ID:
- 1062811
- Report Number(s):
- SAND2012--9495J; PII: S0743731514000884
- Journal Information:
- Journal of Parallel and Distributed Computing, Journal Name: Journal of Parallel and Distributed Computing Journal Issue: 8 Vol. 74; ISSN 0743-7315
- Publisher:
- ElsevierCopyright Statement
- Country of Publication:
- United States
- Language:
- English
Similar Records
Parallel Harness for Informatic Stream Hashing
MapReduce MPI library (MR-MPI) v. 1.0