Enhancing Localization, Selection, and Processing of Data in Vehicular Cyber-Physical Systems
Doctoral thesis, 2024

Connected devices on the edge of the Edge-to-Cloud (E2C) continuum are producing increasing amounts of data that hold the key to unlocking valuable use cases among a wide range of applications. In the vehicular domain, connected vehicles in large fleets (called Vehicular Cyber-Physical Systems or VCPSs) sense and collect terabytes of data such as time series and video, enabling everything from predictive maintenance to autonomous drive. For VCPSs the computing devices located onboard vehicles are not dimensioned to process all the data produced onboard. Simultaneously, communication to the cloud, where computing resources are more readily available, relies on bandwidth-limited and costly carrier-operated cellular connectivity.
As transmitting all raw data to the cloud for analysis incurs increasing costs and processing latencies, and the edge devices lack the capability to perform all required data analyses, the questions of "Where" and "How" to process "Which" data become paramount and form the foundation of this thesis.
The first part of this thesis gives an outline of my work by introducing relevant background topics, motivating the research questions and describing the contributions of this thesis. These contributions are then contained in the five chapters that make up the second part: in Chapter A, I present the DRIVEN framework consisting of a novel lossy online time-series compression algorithm with tuneable bounded error for the edge, as part of a pipeline from edge to cloud that includes online data clustering, and evaluate the tradeoffs between data savings and reduced analysis accuracy from lossy compression. In Chapter B, I show how our work on Data Localization helps in discovering those vehicles in a connected fleet that have data relevant to a user-defined analysis task quickly and efficiently. Chapter C proposes Ananke, the first forward provenance framework for Stream Processing, enabling a route for selecting relevant data inside streaming sources that are ubiquitous in VCPSs. In Chapter D, I present the Nona framework that solves the problem of forward provenance for evolving sets of Stream Processing queries and thus allows data selection for modern analysis flows in which queries are constantly altered and redeployed. Finally, in Chapter E, I introduce a comprehensive requirements list for and an implementation of a VCPS learning simulator that enables the efficient evaluation of distributed data analysis algorithms for connected vehicular networks.
This thesis makes significant steps forward for utilizing edge resources more efficiently, while also setting the basis for further development of novel distributed data analysis algorithms in VCPSs.

Vehicular Cyber-Physical Systems

Stream Processing

Edge-to-Cloud Continuum

Provenance

Distributed Data Analysis

HA2, Hörsalsvägen 4
Opponent: Prof. Nalini Venkatasubramanian, University of California, Irvine, United Stated of America

Author

Bastian Havers

Network and Systems

DRIVEN: A framework for efficient Data Retrieval and clustering in Vehicular Networks

Future Generation Computer Systems,;Vol. 107(2020)p. 1-17

Journal article

Time- and Computation-Efficient Data Localization at Vehicular Networks' Edge

IEEE Access,;Vol. 9(2021)p. 137714-137732

Journal article

Ananke: A Streaming Framework for Live Forward Provenance

Proceedings of the VLDB Endowment,;Vol. 14(2020)p. 391-403

Journal article

Havers, B., Papatriantafilou, M., Gulisano, V.: "Nona: A Framework for Elastic Stream Provenance"

Proposing a framework for evaluating learning strategies in vehicular CPSs

Middleware 2022 Industrial Track - Proceedings of the 23rd International Middleware Conference Industrial Track, Part of Middleware 2022,;(2022)p. 22-28

Paper in proceeding

Att utnyttja bilar som datorer på hjul

Utvecklingen av nya bilar och särskilt nya funktioner inom bilar blir allt mer datadriven. Medan utvecklingen av självkörande fordon till exempel behöver stora mängder data, kan andra applikationer som detektion av hala vägar behöva data mycket snabbt för att kunna varna andra förare. All denna data har sitt ursprung i moderna bilar med deras hundratals sensorer, inklusive radar, kameror och snart även LiDAR. Genom kommunikation med molnet bildar dessa bilar nätverk med tusentals medlemmar som innehåller värdefull data. Idag lagras denna data vanligtvis tidvis på fordonen innan den laddas upp till molnet. Där samlas data från många bilar, förbehandlas och analyseras samtidigt. Eftersom datamängderna som behövs för utveckling och som produceras på bilar växer, behöver mer data skickas från bilarna till molnet - vilket leder till ökande latenser och högre kostnader för att samla in data, samt en större belastning på molnet för att analysera den.

Detta tillvägagångssätt förbiser en värdefull resurs i dessa nätverk av bilar: datorerna ombord på dem. I denna avhandling presenterar jag nya lösningar för att utnyttja denna resurs för att stödja molnet i analysen av bildata, för att upptäcka relevant data, sammanfatta den, eller utföra delar av analysen även innan datan når molnet. Detta möjliggör att skicka mindre data och att generera resultat snabbare, vilket i sin tur möjliggör analys av ännu mer data med en minskad användning av resurser.

Leveraging cars as computers on wheels

The development of new cars and especially new functions inside cars is more and more data-driven. While, for example, the development of autonomous driving needs large amounts of data, other applications such as slippery road detection may need data very quickly to issue warnings to other drivers.
All this data originates on modern cars with their hundreds of sensors, including radar, cameras, and soon LiDAR. Through communication with the cloud, these cars form networks with thousands of members that hold valuable data.
Today, this data is usually intermittently stored on the vehicles before it is uploaded to the cloud. There, the data from many cars is pooled, preprocessed, and analyzed simultaneously.
As the data amounts needed for development and produced on cars grow, more data needs to be sent from the cars to the cloud - leading to increasing latencies and higher costs for gathering the data, and a higher strain on the cloud for analyzing it.

This approach overlooks a valuable resource in these networks of cars: the computers on-board of them. In this thesis, I present novel solutions for leveraging this resource to support the cloud in car data analysis, to detect relevant data, summarize it, or perform parts of the analysis even before data reaches the cloud. This allows to send less data and to generate results quicker, in turn enabling the analysis of even more data with a reduced use of resources.

AUTOSPADA (Automotive Stream Processing and Distributed Analytics) OODIDA Phase 2

VINNOVA (2019-05884), 2020-03-12 -- 2022-12-31.

BADA - On-board Off-board Distributed Data Analytics

VINNOVA (2016-04260), 2016-12-01 -- 2019-12-31.

Areas of Advance

Information and Communication Technology

Subject Categories (SSIF 2011)

Computer Systems

ISBN

978-91-8103-002-0

Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 5460

Publisher

Chalmers

HA2, Hörsalsvägen 4

Online

Opponent: Prof. Nalini Venkatasubramanian, University of California, Irvine, United Stated of America

More information

Latest update

3/7/2024 7