FORTE: an extensible framework for robustness and efficiency in data transfer pipelines
Paper i proceeding, 2023

In the age of big data and growing product complexity, it is common to monitor many aspects of a product or system, in order to extract well-founded intelligence and draw conclusions, to continue driving innovation. Automating and scaling processes in data-pipelines becomes essential to keep pace with increasing rates of data generated by such practices, while meeting security, governance, scalability and resource-efficiency demands.We present FORTE, an extensible framework for robustness and transfer-efficiency in data pipelines. We identify sources of potential bottlenecks and explore the design space of approaches to deal with the challenges they pose. We study and evaluate synergetic effects of data compression and in-memory processing as well as task scheduling, in association with pipeline performance.A prototype implementation of FORTE is implemented and studied in a use-case at Volvo Trucks for high-volume production-level data sets, in the order of magnitude of hundreds of gigabytes to terabytes per burst. Various general-purpose lossless data compression algorithms are evaluated, in order to balance compression effectiveness and time in the pipeline.All in all, FORTE enables to deal with trade-offs and achieve benefits in latency and sustainable rate (up to 1.8 times better), effectiveness in resource utilisation, all while also enabling additional features such as integrity verification, logging, monitoring and traceability, as well as cataloguing of transferred data. We also note that the resource efficiency improvements achievable with FORTE, and its extensibility, can imply further benefits regarding scheduling, orchestration and energy-efficiency in such pipelines.

resource utilization

distributed processing

data transfer efficiency

data pipelines

internet of things

edge computing

Författare

Martin Hilgendorf

Nätverk och System

Vincenzo Massimiliano Gulisano

Nätverk och System

Marina Papatriantafilou

Nätverk och System

Jan Engström

Volvo Group

Binay Mishra

Volvo Group

DEBS 2023 - Proceedings of the 17th ACM International Conference on Distributed and Event-based Systems

139-150
9798400701221 (ISBN)

17th ACM International Conference on Distributed and Event-based Systems, DEBS 2023
Neuchatel, Switzerland,

Relaxed Semantics Across the Data Analytics Stack (RELAX)

Europeiska kommissionen (EU) (EC/H2020/101072456), 2023-03-01 -- 2027-02-28.

EPITOME - Sammanfattning och strukturering av kontinuerlig data i pipelines för samtidig behandling

Vetenskapsrådet (VR) (2021-05424), 2022-01-01 -- 2025-12-31.

Ämneskategorier (SSIF 2011)

Data- och informationsvetenskap

Elektroteknik och elektronik

DOI

10.1145/3583678.3596892

Mer information

Senast uppdaterat

2023-09-22