GeneaLog: Fine-grained data streaming provenance in cyber-physical systems
Journal article, 2019
Fine-grained data provenance can be especially useful in cyber-physical systems, such as vehicular networks and smart grids. By enabling the extraction of valuable information from raw sensor data, it could, for instance, reduce data transmission and storage requirements. Since cyber-physical systems can have heterogeneous multi-core architectures, ranging from inexpensive single-board computers to high-end servers, there is a demand for efficient provenance techniques that can take advantage of such parallel architectures with minimal overhead. Motivated by this challenge, we present GeneaLog, a novel fine-grained data provenance technique for data streaming applications. Leveraging the logical dependencies of the data, GeneaLog takes advantage of cross-layer properties of the software stack and incurs a minimal, constant size per-tuple overhead. Furthermore, it allows for a modular and efficient algorithmic implementation using only standard (instrumented) data streaming operators. This is particularly useful to distribute the provenance overheads to operators that can be run in parallel, thus leveraging multi-core architectures. We evaluate two implementations of GeneaLog, one based on Apache Flink, a widely-adopted state-of-the-art Stream Processing Engine, and one based on Liebre, an edge-tailored lightweight Stream Processing Engine. We test them both on vehicular and smart grid applications with single-board embedded devices and a high-end server, also studying how GeneaLog affects their scalability and confirming that it efficiently captures fine-grained provenance data with minimal overhead.
Previous article in issue
data streaming
provenance
Author
Dimitrios Palyvos-Giannas
Chalmers, Computer Science and Engineering (Chalmers), Networks and Systems (Chalmers)
Vincenzo Massimiliano Gulisano
Chalmers, Computer Science and Engineering (Chalmers), Networks and Systems (Chalmers)
Marina Papatriantafilou
Chalmers, Computer Science and Engineering (Chalmers), Networks and Systems (Chalmers)
Parallel Computing
0167-8191 (ISSN)
Vol. 89 102552HARE: Self-deploying and Adaptive Data Streaming Analytics in Fog Architectures
Swedish Research Council (VR) (2016-03800), 2017-01-01 -- 2020-12-31.
INDEED
Chalmers, 2016-01-01 -- 2020-12-31.
STAMINA - GE
Göteborg Energi, Foundation for Research and Developmen, 2017-01-01 -- 2021-12-31.
Future factories in the Cloud (FiC)
Swedish Foundation for Strategic Research (SSF) (GMT14-0032), 2016-01-01 -- 2020-12-31.
Areas of Advance
Information and Communication Technology
Subject Categories (SSIF 2011)
Computer Science
DOI
10.1016/j.parco.2019.102552