Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

ProvLight: Efficient Workflow Provenance Capture on the Edge-to-Cloud Continuum

Conference ·
 [1];  [2];  [1];  [3];  [2];  [4];  [1]
  1. University of Rennes, Inria, CNRS, IRISA
  2. Federal University of Rio de Janeiro
  3. ORNL
  4. University of Montpellier, Inria, CNRS, LIRMM

Modern scientific workflows require hybrid infrastructures combining numerous decentralized resources on the IoT/Edge interconnected to Cloud/HPC systems (aka the Computing Continuum) to enable their optimized execution. Understanding and optimizing the performance of such complex Edge-to-Cloud workflows is challenging. Capturing the provenance of key performance indicators, with their related data and processes, may assist in understanding and optimizing workflow executions. However, the capture overhead can be prohibitive, particularly in resource-constrained devices, such as the ones on the IoT/Edge.To address this challenge, based on a performance analysis of existing systems, we propose ProvLight, a tool to enable efficient provenance capture on the IoT/Edge. We leverage simplified data models, data compression and grouping, and lightweight transmission protocols to reduce overheads. We further integrate ProvLight into the E2Clab framework to enable workflow provenance capture across the Edge-to-Cloud Continuum. This integration makes E2Clab a promising platform for the performance optimization of applications through reproducible experiments.We validate ProvLight at a large scale with synthetic workloads on 64 real-life IoT/Edge devices in the FIT IoT LAB testbed. Evaluations show that ProvLight outperforms state-of-the-art systems like ProvLake and DfAnalyzer in resource-constrained devices. ProvLight is 26—37x faster to capture and transmit provenance data; uses 5—7x less CPU; 2x less memory; transmits 2x less data; and consumes 2—2.5x less energy. ProvLight [1] and E2Clab [2] are available as open-source tools.

Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
DOE Contract Number:
AC05-00OR22725
OSTI ID:
2301621
Resource Relation:
Conference: IEEE CLUSTER 2023: IEEE International Conference on Cluster Computing - Santa Fe, New Mexico, United States of America - 10/31/2023 8:00:00 AM-11/3/2023 8:00:00 AM
Country of Publication:
United States
Language:
English

References (40)

Keeping track of user steering actions in dynamic workflows journal October 2019
A Comparative Evaluation of AMQP, MQTT and HTTP Protocols Using Real-Time Public Smart City Data conference September 2021
Provenance Supporting Hyperparameter Analysis in Deep Neural Networks book January 2021
Comparison with HTTP and MQTT In Internet of Things (IoT) conference July 2018
Encyclopedia of Database Systems January 2009
Coding the Computing Continuum: Fluid Function Execution in Heterogeneous Computing Environments conference June 2021
Efficient Runtime Capture of Multiworkflow Data Using Provenance conference September 2019
A survey on provenance: What for? What form? What from? journal October 2017
The Next 5 Years: What Opportunities Should the Database Community Seize to Maximize its Impact? conference May 2020
DfAnalyzer: Runtime dataflow analysis tool for Computational Science and Engineering applications journal July 2020
An On-Device Federated Learning Approach for Cooperative Model Update Between Edge Devices journal January 2021
Edge-Assisted Rendering of 360° Videos Streamed to Head-Mounted Virtual Reality conference December 2018
IoT-Enabled Smart Energy Grid: Applications and Challenges journal January 2021
Deep Anomaly Detection for Time-Series Data in Industrial IoT: A Communication-Efficient On-Device Federated Learning Approach journal April 2021
Federated Learning for Internet of Things conference November 2021
Automating Edge-to-cloud Workflows for Science: Traversing the Edge-to-cloud Continuum with Pegasus conference May 2022
Combining Heuristics to Optimize and Scale the Placement of IoT Applications in the Fog conference December 2018
Reproducible Performance Optimization of Complex Applications on the Edge-to-Cloud Continuum conference September 2021
Edge Intelligence: The Confluence of Edge Computing and Artificial Intelligence journal August 2020
Distributed intelligence on the Edge-to-Cloud Continuum: A systematic literature review journal August 2022
Big data and extreme-scale computing: Pathways to Convergence-Toward a shaping strategy for a future software and data ecosystem for scientific inquiry journal July 2018
The trade-offs between Fog Processing and Communications in latency-sensitive Vehicular Fog Computing journal August 2022
E2Clab: Exploring the Computing Continuum through Repeatable, Replicable and Reproducible Edge-to-Cloud Experiments conference September 2020
EnosLib: A Library for Experiment-Driven Research in Distributed Computing journal June 2022
EdgeFed: Optimized Federated Learning Based on Edge Computing journal January 2020
Runtime Analysis of Whole-System Provenance
  • Pasquier, Thomas; Han, Xueyuan; Moyer, Thomas
  • CCS '18: 2018 ACM SIGSAC Conference on Computer and Communications Security, Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security https://doi.org/10.1145/3243734.3243776
conference October 2018
LineageChain: a fine-grained, secure and efficient data provenance system for blockchains journal January 2021
Genoma: Distributed Provenance as a Service for IoT-based Systems conference April 2019
Grid'5000: A Large Scale And Highly Reconfigurable Experimental Grid Testbed journal November 2006
Managing Large-Scale Workflow Execution from Resource Provisioning to Provenance Tracking: The CyberShake Example conference December 2006
FIT IoT-LAB: A large scale open experimental IoT testbed conference December 2015
Taverna: lessons in creating a workflow environment for the life sciences journal January 2006
Active Provenance for Data-Intensive Workflows: Engaging Users and Developers conference September 2019
Towards a provenance collection framework for Internet of Things devices
  • Nwafor, Ebelechukwu; Campbell, Andre; Hill, David
  • 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI) https://doi.org/10.1109/UIC-ATC.2017.8397531
conference August 2017
An Approach to Standalone Provenance Systems for Big Social Provenance Data conference August 2016
Scientific workflow management and the Kepler system
  • Ludäscher, Bertram; Altintas, Ilkay; Berkley, Chad
  • Concurrency and Computation: Practice and Experience, Vol. 18, Issue 10 https://doi.org/10.1002/cpe.994
journal January 2006
Prov-Io conference June 2022
Pegasus, a workflow management system for science automation journal May 2015
Survey on the Analysis of User Interactions and Visualization Provenance journal June 2020
The W3C PROV family of specifications for modelling provenance metadata conference March 2013