Nona: A Framework for Elastic Stream Provenance
Paper i proceeding, 2024

Forward Provenance for streaming queries run by distributed and parallel Stream Processing Engines gives fine-grained insights on input-output data dependencies enabling, e.g., precise debugging and smart data selection. State-of-the-art provenance frameworks, though, build on an assumption that is unrealistic for distributed systems like Vehicular Networks and Smart Grids, namely, that the whole set of queries in need of provenance is known in advance and static. In real-world use cases, queries are continuously added, removed, and modified over time by both data analysts and SPE systems themselves. Motivated by the lack of solutions for the forward provenance of dynamic sets of queries, we introduce a novel framework, named Nona, for parallel and distributed streaming queries. We formalize the notion of forward provenance for evolving query sets and prove it is possible to extend the same guarantees the state-of-the-art offers for static query sets. Our evaluation shows that Nona can cope with adaptations to changes in query sets with sub-second responsiveness; moreover, it incurs negligible overheads compared to the state-of-the-art, during the periods in which a query set does not undergo changes.

Stream Processing

Provenance

Elasticity

Författare

Bastian Havers

Nätverk och System

Marina Papatriantafilou

Nätverk och System

Vincenzo Massimiliano Gulisano

Nätverk och System

Proceedings - International Conference on Distributed Computing Systems

10636927 (ISSN) 25758411 (eISSN)

703-714
9798350386059 (ISBN)

44th IEEE International Conference on Distributed Computing Systems, ICDCS 2024
Jersey City, USA,

AutoSPADA (Automotive Stream Processing and Distributed Analytics) OODIDA Phase 2

VINNOVA (2019-05884), 2020-03-12 -- 2022-12-31.

Relaxed Semantics Across the Data Analytics Stack (RELAX)

Europeiska kommissionen (EU) (EC/H2020/101072456), 2023-03-01 -- 2027-02-28.

BADA - On-board Off-board Distributed Data Analytics

VINNOVA (2016-04260), 2016-12-01 -- 2019-12-31.

Ämneskategorier

Datorteknik

Datavetenskap (datalogi)

Datorsystem

DOI

10.1109/ICDCS60910.2024.00071

Mer information

Senast uppdaterat

2024-09-18