Modelling Data Pipelines
Paper i proceeding, 2020

Data is the new currency and key to success. However, collecting high-quality data from multiple distributed sources requires much effort. In addition, there are several other challenges involved while transporting data from its source to the destination. Data pipelines are implemented in order to increase the overall efficiency of data-flow from the source to the destination since it is automated and reduces the human involvement which is required otherwise. Despite existing research on ETL (Extract-Transform-Load) and ELT (Extract-Load-Transform) pipelines, the research on this topic is limited. ETL/ELT pipelines are abstract representations of the end-to-end data pipelines. To utilize the full potential of the data pipeline, we should understand the activities in it and how they are connected in an end-to-end data pipeline. This study gives an overview of how to design a conceptual model of data pipeline which can be further used as a language of communication between different data teams. Furthermore, it can be used for automation of monitoring, fault detection, mitigation and alarming at different steps of data pipeline.

domain specific language

conceptual model

Data pipelines

data workflow

agile methodology

Författare

Aiswarya Raj Munappy

Chalmers, Data- och informationsteknik, Software Engineering, Software Engineering for Testing, Requirements, Innovation and Psychology

Jan Bosch

Chalmers, Data- och informationsteknik, Software Engineering, Software Engineering for Testing, Requirements, Innovation and Psychology

Helena Holmström Olsson

Chalmers, Data- och informationsteknik, Software Engineering

Tian J. Wang

Ericsson AB

2020 46th Euromicro Conference on Software Engineering and Advanced Applications (SEAA)

2169-3536 (ISSN)

SEAA2020 -46th Euromicro Conference on Software Engineering and Advanced Applications
Ljubljana, Slovenia,

HoliDev - Holistic DevOps Framework

VINNOVA, 2018-01-01 -- 2019-12-31.

Ämneskategorier

Annan data- och informationsvetenskap

Mediateknik

Datavetenskap (datalogi)

DOI

10.1109/SEAA51224.2020.00014

Mer information

Skapat

2020-11-01