GeneaLog: Fine-grained data streaming provenance in cyber-physical systems
Artikel i vetenskaplig tidskrift, 2019
Fine-grained data provenance can be especially useful in cyber-physical systems, such as vehicular networks and smart grids. By enabling the extraction of valuable information from raw sensor data, it could, for instance, reduce data transmission and storage requirements. Since cyber-physical systems can have heterogeneous multi-core architectures, ranging from inexpensive single-board computers to high-end servers, there is a demand for efficient provenance techniques that can take advantage of such parallel architectures with minimal overhead. Motivated by this challenge, we present GeneaLog, a novel fine-grained data provenance technique for data streaming applications. Leveraging the logical dependencies of the data, GeneaLog takes advantage of cross-layer properties of the software stack and incurs a minimal, constant size per-tuple overhead. Furthermore, it allows for a modular and efficient algorithmic implementation using only standard (instrumented) data streaming operators. This is particularly useful to distribute the provenance overheads to operators that can be run in parallel, thus leveraging multi-core architectures. We evaluate two implementations of GeneaLog, one based on Apache Flink, a widely-adopted state-of-the-art Stream Processing Engine, and one based on Liebre, an edge-tailored lightweight Stream Processing Engine. We test them both on vehicular and smart grid applications with single-board embedded devices and a high-end server, also studying how GeneaLog affects their scalability and confirming that it efficiently captures fine-grained provenance data with minimal overhead.
Previous article in issue
data streaming
provenance
Författare
Dimitrios Palyvos-Giannas
Chalmers, Data- och informationsteknik, Nätverk och system
Vincenzo Massimiliano Gulisano
Chalmers, Data- och informationsteknik, Nätverk och system
Marina Papatriantafilou
Chalmers, Data- och informationsteknik, Nätverk och system
Parallel Computing
0167-8191 (ISSN)
Vol. 89 102552HAREN: Självdistribuerad och anpassningsbar dataströmningsanalys i dimman
Vetenskapsrådet (VR) (2016-03800), 2017-01-01 -- 2020-12-31.
INDEED
Chalmers, 2016-01-01 -- 2020-12-31.
STAMINA - GE
Göteborg Energi, Forskningsstiftelsen, 2017-01-01 -- 2021-12-31.
Molnbaserade produkter och produktion (FiC)
Stiftelsen för Strategisk forskning (SSF) (GMT14-0032), 2016-01-01 -- 2020-12-31.
Styrkeområden
Informations- och kommunikationsteknik
Ämneskategorier
Datavetenskap (datalogi)
DOI
10.1016/j.parco.2019.102552