Modelling Data Pipelines
Paper in proceeding, 2020

Data is the new currency and key to success. However, collecting high-quality data from multiple distributed sources requires much effort. In addition, there are several other challenges involved while transporting data from its source to the destination. Data pipelines are implemented in order to increase the overall efficiency of data-flow from the source to the destination since it is automated and reduces the human involvement which is required otherwise. Despite existing research on ETL (Extract-Transform-Load) and ELT (Extract-Load-Transform) pipelines, the research on this topic is limited. ETL/ELT pipelines are abstract representations of the end-to-end data pipelines. To utilize the full potential of the data pipeline, we should understand the activities in it and how they are connected in an end-to-end data pipeline. This study gives an overview of how to design a conceptual model of data pipeline which can be further used as a language of communication between different data teams. Furthermore, it can be used for automation of monitoring, fault detection, mitigation and alarming at different steps of data pipeline.

Data pipelines

agile methodology

conceptual model

domain specific language

data workflow

Author

Aiswarya Raj Munappy

Chalmers, Computer Science and Engineering (Chalmers), Software Engineering (Chalmers)

Jan Bosch

Chalmers, Computer Science and Engineering (Chalmers), Software Engineering (Chalmers)

Helena Holmström Olsson

Chalmers, Computer Science and Engineering (Chalmers), Software Engineering (Chalmers)

Tian J. Wang

Ericsson

Proceedings - 46th Euromicro Conference on Software Engineering and Advanced Applications, SEAA 2020

13-20
9781728195322 (ISBN)

SEAA2020 -46th Euromicro Conference on Software Engineering and Advanced Applications
Ljubljana, Slovenia,

HoliDev - Holistic DevOps Framework

VINNOVA (2017-05218), 2018-01-01 -- 2019-12-31.

Subject Categories

Other Computer and Information Science

Media Engineering

Computer Science

DOI

10.1109/SEAA51224.2020.00014

More information

Latest update

1/3/2024 9