GeneaLog: Fine-grained data streaming provenance in cyber-physical systems

Dimitrios Palyvos-Giannas; Vincenzo Massimiliano Gulisano; Marina Papatriantafilou

doi:10.1016/j.parco.2019.102552

GeneaLog: Fine-grained data streaming provenance in cyber-physical systems
Artikel i vetenskaplig tidskrift, 2019

Streaming applications continuously process data to deliver streams of up-to-date results. Their growing adoption for data analysis in many distributed systems is motivated by their performance (in terms of processing throughput and latency) and their support for easy-to-program distributed and parallel analysis. When streaming applications are designed to detect unusual or critical events (e.g., security- or safety-related), it can be beneficial to maintain the associated source data for further analysis. This can be achieved by fine-grained data provenance, which links each detected event back to the source data that contributed to it, allowing to distinguish and isolate the source data that generated such unusual or critical events.

Fine-grained data provenance can be especially useful in cyber-physical systems, such as vehicular networks and smart grids. By enabling the extraction of valuable information from raw sensor data, it could, for instance, reduce data transmission and storage requirements. Since cyber-physical systems can have heterogeneous multi-core architectures, ranging from inexpensive single-board computers to high-end servers, there is a demand for efficient provenance techniques that can take advantage of such parallel architectures with minimal overhead. Motivated by this challenge, we present GeneaLog, a novel fine-grained data provenance technique for data streaming applications. Leveraging the logical dependencies of the data, GeneaLog takes advantage of cross-layer properties of the software stack and incurs a minimal, constant size per-tuple overhead. Furthermore, it allows for a modular and efficient algorithmic implementation using only standard (instrumented) data streaming operators. This is particularly useful to distribute the provenance overheads to operators that can be run in parallel, thus leveraging multi-core architectures. We evaluate two implementations of GeneaLog, one based on Apache Flink, a widely-adopted state-of-the-art Stream Processing Engine, and one based on Liebre, an edge-tailored lightweight Stream Processing Engine. We test them both on vehicular and smart grid applications with single-board embedded devices and a high-end server, also studying how GeneaLog affects their scalability and confirming that it efficiently captures fine-grained provenance data with minimal overhead.
Previous article in issue

data streaming

provenance

Författare

Dimitrios Palyvos-Giannas

Chalmers, Data- och informationsteknik, Nätverk och system

Forskning Andra publikationer

Vincenzo Massimiliano Gulisano

Chalmers, Data- och informationsteknik, Nätverk och system

Forskning Andra publikationer

Marina Papatriantafilou

Chalmers, Data- och informationsteknik, Nätverk och system

Forskning Andra publikationer

Parallel Computing

0167-8191 (ISSN)

Vol. 89 102552

HAREN: Självdistribuerad och anpassningsbar dataströmningsanalys i dimman

Vetenskapsrådet (VR) (2016-03800), 2017-01-01 -- 2020-12-31.

Visa projekt

INDEED

Chalmers, 2016-01-01 -- 2020-12-31.

Visa projekt

STAMINA - GE

Göteborg Energi AB, 2017-01-01 -- 2021-12-31.

Visa projekt

Molnbaserade produkter och produktion (FiC)

Stiftelsen för Strategisk forskning (SSF) (GMT14-0032), 2016-01-01 -- 2020-12-31.

Visa projekt

Styrkeområden

Informations- och kommunikationsteknik

Ämneskategorier (SSIF 2011)

Datavetenskap (datalogi)

DOI

10.1016/j.parco.2019.102552

Publikationsdata kopplat till DOI

Mer information

Senast uppdaterat

2021-02-26

GeneaLog: Fine-grained data streaming provenance in cyber-physical systems Artikel i vetenskaplig tidskrift, 2019

Författare

Dimitrios Palyvos-Giannas

Vincenzo Massimiliano Gulisano

Marina Papatriantafilou

Parallel Computing

HAREN: Självdistribuerad och anpassningsbar dataströmningsanalys i dimman

INDEED

STAMINA - GE

Molnbaserade produkter och produktion (FiC)

Styrkeområden

Ämneskategorier (SSIF 2011)

DOI

Mer information

Senast uppdaterat

GeneaLog: Fine-grained data streaming provenance in cyber-physical systems
Artikel i vetenskaplig tidskrift, 2019