Towards automated detection of data pipeline faults
Paper i proceeding, 2020

Data pipelines play an important role throughout the data management process. It automates the steps ranging from data generation to data reception thereby reducing the human intervention. A failure or fault in a single step of a data pipeline has cascading effects that might result in hours of manual intervention and clean-up. Data pipeline failure due to faults at different stages of data pipelines is a common challenge that eventually leads to significant performance degradation of data-intensive systems. To ensure early detection of these faults and to increase the quality of the data products, continuous monitoring and fault detection mechanism should be included in the data pipeline. In this study, we have explored the need for incorporating automated fault detection mechanisms and mitigation strategies at different stages of the data pipeline. Further, we identified faults at different stages of the data pipeline and possible mitigation strategies that can be adopted for reducing the impact of data pipeline faults thereby improving the quality of data products. The idea of incorporating fault detection and mitigation strategies is validated by realizing a small part of the data pipeline using action research in the analytics team at a large software-intensive organization within the telecommunication domain.

fault- tolerance

failure recovery

fault detection

anomalies

component

mitigation

robustness

data quality

data pipeline

Författare

Aiswarya Raj Munappy

Chalmers, Data- och informationsteknik, Software Engineering

Jan Bosch

Chalmers, Data- och informationsteknik, Software Engineering

Helena Holmström Olsson

Malmö universitet

Tian J. Wang

Ericsson AB

Proceedings - Asia-Pacific Software Engineering Conference, APSEC

15301362 (ISSN)

Vol. 2020-December 346-355 9359276
9781728195537 (ISBN)

27th Asia-Pacific Software Engineering Conference
Singapore, Singapore,

Ämneskategorier (SSIF 2011)

Annan data- och informationsvetenskap

Programvaruteknik

Bioinformatik och systembiologi

DOI

10.1109/APSEC51365.2020.00043

Mer information

Senast uppdaterat

2021-03-26