Deterministic, Explainable and Resource-Efficient Stream Processing for Cyber-Physical Systems
Licentiatavhandling, 2019

We are undeniably living in the era of big data, where people and machines generate information at an unprecedented rate. While processing such data can provide immense value, it can prove especially challenging because of the data's Volume, Variety and Velocity. Velocity can be particularly important in environments that need to respond to incoming data in near real-time, such as cyber-physical systems. In such cases, the batch processing paradigm, which requires all data to be persistently stored and available, might not be appropriate. Instead, it can be desirable to perform stream processing, where unbounded datasets of streaming data are processed in an online manner, generating results quickly and thus significantly benefiting applications with strict latency requirements. However, it can be challenging for stream processing to provide the same guarantees and ease-of-use as traditional batch processing systems. This thesis studies ways to alleviate this by introducing techniques that make stream processing more predictable, explainable, and resource-efficient.

In the first part of the thesis, we study determinism, which can guarantee predictable and reproducible results in stream processing, regardless of the runtime system characteristics. We present Viper, a module for stream processing frameworks that provides determinism with a minimal performance impact. In the second part, we study fine-grained data provenance, which links each streaming result with the inputs that led to its generation. Fine-grained data provenance can help make stream processing easier to understand and debug. Additionally, it can reduce storage and transmission costs by allowing to maintain only the essential input data. We propose the GeneaLog framework that provides fine-grained data provenance in stream processing with minimal overhead. In the third part of the thesis, we explore scheduling and its use in stream processing for controlling resource allocation and achieving specific performance goals. We develop Haren, a framework that can be integrated into stream processing frameworks, providing custom thread scheduling capabilities. We study Haren's efficiency and its facilities that allow a user to control the resource allocation of a streaming system. We evaluate all three proposed frameworks with relevant streaming use cases from the real-world and illustrate their efficiency and ease-of-use.

Provenance

Stream Processing

Scheduling

Determinism

Room EB, EDIT Building, Hörsalsvägen 11, Campus Johanneberg, Chalmers
Opponent: Prof. Gabriele Mencagli, Department of Computer Science, University of Pisa, Italy

Författare

Dimitrios Palyvos-Giannas

Chalmers, Data- och informationsteknik, Nätverk och system

Viper: A module for communication-layer determinism and scaling in low-latency stream processing

Future Generation Computer Systems,; Vol. 88(2018)p. 297-308

Artikel i vetenskaplig tidskrift

D. Palyvos-Giannas, V. Gulisano, and M. Papatriantafilou GeneaLog: Fine-Grained Data Streaming Provenance in Cyber-Physical Systems

Haren: A Framework for Ad-Hoc Thread Scheduling Policies for Data Streaming Applications

Proceedings of the 13th ACM International Conference on Distributed and Event-based Systems ,; (2019)p. 19-30

Paper i proceeding

HAREN: Självdistribuerad och anpassningsbar dataströmningsanalys i dimman

Vetenskapsrådet (VR) (2016-03800), 2017-01-01 -- 2020-12-31.

Ämneskategorier

Datorteknik

Datavetenskap (datalogi)

Datorsystem

Styrkeområden

Informations- och kommunikationsteknik

Technical report L - Department of Computer Science and Engineering, Chalmers University of Technology and Göteborg University: 200

Utgivare

Chalmers

Room EB, EDIT Building, Hörsalsvägen 11, Campus Johanneberg, Chalmers

Opponent: Prof. Gabriele Mencagli, Department of Computer Science, University of Pisa, Italy

Mer information

Senast uppdaterat

2019-09-06