Nona: A Framework for Elastic Stream Provenance
Paper in proceeding, 2024

Forward Provenance for streaming queries run by distributed and parallel Stream Processing Engines gives fine-grained insights on input-output data dependencies enabling, e.g., precise debugging and smart data selection. State-of-the-art provenance frameworks, though, build on an assumption that is unrealistic for distributed systems like Vehicular Networks and Smart Grids, namely, that the whole set of queries in need of provenance is known in advance and static. In real-world use cases, queries are continuously added, removed, and modified over time by both data analysts and SPE systems themselves. Motivated by the lack of solutions for the forward provenance of dynamic sets of queries, we introduce a novel framework, named Nona, for parallel and distributed streaming queries. We formalize the notion of forward provenance for evolving query sets and prove it is possible to extend the same guarantees the state-of-the-art offers for static query sets. Our evaluation shows that Nona can cope with adaptations to changes in query sets with sub-second responsiveness; moreover, it incurs negligible overheads compared to the state-of-the-art, during the periods in which a query set does not undergo changes.

Provenance

Elasticity

Stream Processing

Author

Bastian Havers

Chalmers, Computer Science and Engineering (Chalmers), Computer and Network Systems

Marina Papatriantafilou

Chalmers, Computer Science and Engineering (Chalmers), Computer and Network Systems

Vincenzo Massimiliano Gulisano

Chalmers, Computer Science and Engineering (Chalmers), Computer and Network Systems

Proceedings - International Conference on Distributed Computing Systems

10636927 (ISSN) 25758411 (eISSN)

703-714
9798350386059 (ISBN)

44th IEEE International Conference on Distributed Computing Systems, ICDCS 2024
Jersey City, USA,

VR EPITOME - Summarization and structuring of continuous data in concurrent processing pipelines

Swedish Research Council (VR) (2021-05424), 2022-01-01 -- 2025-12-31.

BADA - On-board Off-board Distributed Data Analytics

VINNOVA (2016-04260), 2016-12-01 -- 2019-12-31.

AUTOSPADA (Automotive Stream Processing and Distributed Analytics) OODIDA Phase 2

VINNOVA (2019-05884), 2020-03-12 -- 2022-12-31.

Relaxed Semantics Across the Data Analytics Stack (RELAX-DN)

European Commission (EC) (EC/HE/101072456), 2023-03-01 -- 2027-03-01.

Subject Categories (SSIF 2011)

Computer Engineering

Computer Science

Computer Systems

Areas of Advance

Information and Communication Technology

Transport

Production

Energy

DOI

10.1109/ICDCS60910.2024.00071

More information

Latest update

12/17/2025