Deterministic, Explainable and Resource-Efficient Stream Processing for Cyber-Physical Systems
Licentiate thesis, 2019

We are undeniably living in the era of big data, where people and machines generate information at an unprecedented rate. While processing such data can provide immense value, it can prove especially challenging because of the data's Volume, Variety and Velocity. Velocity can be particularly important in environments that need to respond to incoming data in near real-time, such as cyber-physical systems. In such cases, the batch processing paradigm, which requires all data to be persistently stored and available, might not be appropriate. Instead, it can be desirable to perform stream processing, where unbounded datasets of streaming data are processed in an online manner, generating results quickly and thus significantly benefiting applications with strict latency requirements. However, it can be challenging for stream processing to provide the same guarantees and ease-of-use as traditional batch processing systems. This thesis studies ways to alleviate this by introducing techniques that make stream processing more predictable, explainable, and resource-efficient.

In the first part of the thesis, we study determinism, which can guarantee predictable and reproducible results in stream processing, regardless of the runtime system characteristics. We present Viper, a module for stream processing frameworks that provides determinism with a minimal performance impact. In the second part, we study fine-grained data provenance, which links each streaming result with the inputs that led to its generation. Fine-grained data provenance can help make stream processing easier to understand and debug. Additionally, it can reduce storage and transmission costs by allowing to maintain only the essential input data. We propose the GeneaLog framework that provides fine-grained data provenance in stream processing with minimal overhead. In the third part of the thesis, we explore scheduling and its use in stream processing for controlling resource allocation and achieving specific performance goals. We develop Haren, a framework that can be integrated into stream processing frameworks, providing custom thread scheduling capabilities. We study Haren's efficiency and its facilities that allow a user to control the resource allocation of a streaming system. We evaluate all three proposed frameworks with relevant streaming use cases from the real-world and illustrate their efficiency and ease-of-use.

Provenance

Stream Processing

Scheduling

Determinism

Room EB, EDIT Building, Hörsalsvägen 11, Campus Johanneberg, Chalmers
Opponent: Prof. Gabriele Mencagli, Department of Computer Science, University of Pisa, Italy

Author

Dimitrios Palyvos-Giannas

Chalmers, Computer Science and Engineering (Chalmers), Networks and Systems (Chalmers)

Viper: A module for communication-layer determinism and scaling in low-latency stream processing

Future Generation Computer Systems,; Vol. 88(2018)p. 297-308

Journal article

D. Palyvos-Giannas, V. Gulisano, and M. Papatriantafilou GeneaLog: Fine-Grained Data Streaming Provenance in Cyber-Physical Systems

Haren: A Framework for Ad-Hoc Thread Scheduling Policies for Data Streaming Applications

Proceedings of the 13th ACM International Conference on Distributed and Event-based Systems ,; (2019)p. 19-30

Paper in proceedings

HARE: Self-deploying and Adaptive Data Streaming Analytics in Fog Architectures

Swedish Research Council (VR), 2017-01-01 -- 2020-12-31.

Subject Categories

Computer Engineering

Computer Science

Computer Systems

Areas of Advance

Information and Communication Technology

Technical report L - Department of Computer Science and Engineering, Chalmers University of Technology and Göteborg University: 200

Publisher

Chalmers University of Technology

Room EB, EDIT Building, Hörsalsvägen 11, Campus Johanneberg, Chalmers

Opponent: Prof. Gabriele Mencagli, Department of Computer Science, University of Pisa, Italy

More information

Latest update

9/6/2019 2